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Abstract 

A  new  generation  of  efficient  parallel,  multi-scale,  and  interdisciplinary  ocean 
models  is  required  for  better  understanding  and  accurate  predictions.  The  purpose  of 
this  thesis  is  to  quantitatively  identify  promising  numerical  methods  that  are  suitable 
to  such  predictions.  In  order  to  fulhll  this  purpose,  current  efforts  towards  creating 
new  ocean  models  are  reviewed,  an  understanding  of  the  most  promising  methods  used 
by  other  researchers  is  developed,  the  most  promising  existing  methods  are  studied 
and  applied  to  idealized  cases,  new  methods  are  incubated  and  evaluated  by  solving 
test  problems,  and  important  numerical  issues  related  to  efficiency  are  examined. 

The  results  of  other  research  groups  towards  developing  the  second  generation  of 
ocean  models  are  first  reviewed.  Next,  the  Discontinuous  Galerkin  (DG)  method  for 
solving  advection-diffusion  problems  is  described,  including  a  discussion  on  schemes 
for  solving  higher  order  derivatives.  The  discrete  formulation  for  advection-diffusion 
problems  is  detailed  and  implementation  issues  are  discussed.  The  Hybrid  Discon¬ 
tinuous  Galerkin  (HDG)  Finite  Element  Method  (FEM)  is  identihed  as  a  promising 
new  numerical  scheme  for  ocean  simulations.  For  the  hrst  time,  a  DG  FEM  scheme  is 
used  to  solve  ocean  biogeochemical  advection-diffusion-reaction  equations  on  a  two- 
dimensional  idealized  domain,  and  p-adaptivity  across  constituents  is  examined.  Each 
aspect  of  the  numerical  solution  is  examined  separately,  and  p-adaptive  strategies  are 
explored.  Finally,  numerous  solver-preconditioner  combinations  are  benchmarked  to 
identify  an  efficient  solution  method  for  inverting  matrices,  which  is  necessary  for 
implicit  time  integration  schemes.  From  our  quantitative  incubation  of  numerical 
schemes,  a  number  of  recommendations  on  the  tools  necessary  to  solve  dynamical 
equations  for  multiscale  ocean  predictions  are  provided. 

Thesis  Supervisor:  Pierre  F.  J.  Lermusiaux 

Title:  Associate  Professor  of  Mechanical  Engineering 
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Glossary 


Eh  The  set  of  discretized  edges 

The  set  of  discretized  edges  on  the  boundary  of  the  domain 
The  set  of  discretized  edges  on  the  interior  of  the  domain 
0  A  generic  (modal  or  nodal)  basis  function 

The  domain  of  interest 
The  boundary  of  the  domain  of  interest 
(j)  A  nodal  basis  function 

ip  A  modal  basis  function 


C  Convection  (or  stiffness)  matrix  0i(x)  •  \/9j{x.)dK 

F  The  functional  form  of  the  flux 

K  A  single  element  in  the  triangulation 

dK  The  boundary  of  a  single  element  in  the  triangulation 

M  The  mass-matrix,  where  Mjj  =  6i0j(Kl 

n  The  unit  normal  vector  pointing  out  of  the  domain 

Set  of  polynomials  of  order  p 

q  The  scaled  gradient  of  u,  that  is,  q  —  k\7u  =  0 

R  The  residual 

S,  S  The  scalar  and  vector  functional  forms  of  the  source  term 

Tfi  The  discretized  triangulation 

u,  u  The  unknown  scalar  or  vector  respectively 

V  Vector  weighting  (or  test)  function 

V  Generalized  Vandermonde  matrix 

w  Scalar  weighting  (or  test)  function 

X  Spatial  coordinates 


ADR 

BiCGSTAB 

CG 

CDG 

CFD 

CFL 

CGS 

DG 

DOF 

FD 

FEM 

FV 

GCM 

GMRES 

GS 

h-adaptive 

HDG 

HS 

IBM 

ILU 

IP 

LDG 


Advection-Diffusion-Reaction 
Bi-Conjugate  Gradient  STABilized 
Continuous  Galerkin 
Compact  Discontinuous  Galerkin 
Computational  Fluid  Dynamics 

Courant-Friedrichs-Lewy:  Numerical  stability  condition 

Conjugate  Gradient  Squared 

Discontinuous  Galerkin 

Degree  (s)  of  Freedom 

Finite  Difference 

Finite  Element  Method 

Finite  Volume 

General  Circulation  Model 

Generalized  Minimum  RESidual 

Gauss-Seidel 

Mesh  adaptation  strategy  based  on  refining/coarsening  elements 

Hybrid  Discontinuous  Galerkin 

Hydrostatic 

Immersed  Boundary  Method 
Incomplete  Lower  Upper  factorization 
Internal  Penalty  method 
Local  Discontinuous  Galerkin 
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LU 

MG 

MPI 

MWR 

NHS 

NPDZ 

NPZ 

p-adaptive 

PE 

QMR 

RK 

RKDG 

S-coordinates 

SSP 

SWE 

WHOI 

Z-coordinates 


Lower  Upper  factorization 
Multi-Grid 

Message  Passing  Interface:  A  parallel  programming  language 

Method  of  Weighted  Residuals 

Non-Hydrostatic 

Nutrient-Phytoplankton-Detritus-Zooplankton:  A  four-compo¬ 
nent  biological  model 

Nutrient-Phytoplankton-Zooplankton:  A  three-component  bio¬ 
logical  model 

Mesh  adaptation  strategy  based  in  increasing/decreasing  the  poly¬ 
nomial  order  of  basis  functions 
Primitive  Equations 
Quasi-Minimum  Residual 
Runge-Kutta:  A  time  discretization  scheme 
Runge-Kutta  Discontinuous  Galerkin 

Sigma  coordinates:  A  terrain-following  vertical  discretization 
scheme 

Strong  Stability  Preserving:  Type  of  RK  scheme 
Shallow  Water  Equations 
Woods  Hole  Oceanographic  Institute 
A  stair-case  vertical  discretization  scheme 
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Chapter  1 


Introduction 


The  impact  of  human  activities  on  the  ocean  and  lakes  is  becoming  increasingly 
global.  To  successfully  coexist  with  the  ocean  and  utilize  marine  resources,  civiliza¬ 
tion  needs  to  monitor  and  predict  our  natural  environment.  A  new  generation  of 
efficient  parallel,  multi-scale,  and  interdisciplinary  ocean  models  is  required  for  better 
understanding  and  accurate  predictions.  There  is  a  rich  spectrum  of  needs  for  ocean 
modeling,  including  climate  dynamics,  the  sustenance  of  life  on  Earth,  coastal  ocean 
and  hsheries  management,  biological  production  and  ecosystem  dynamics,  efficient 
maritime  route  planning,  hazardous  spills  dispersion,  and  underwater  sound  propa¬ 
gation  for  efficient  naval  operations.  Ocean  prediction  is  a  challenging  problem  due 
to  its  multi-disciplinary  and  multi-scale  nature,  and  due  to  the  constraint  of  real-time 
predictions.  Depending  on  the  phenomena  being  examined,  space  scales  can  vary 
from  millimeters  to  planetary,  and  time  scales  can  vary  between  seconds  to  millen¬ 
niums.  Also,  for  accurate  simulation  results,  efficient  nonlinear  assimilation  of  data 
into  ocean  models  and  estimation  of  the  most  useful  data  using  adaptive  sampling  is 
required. 

The  MIT  “Multidisciplinary  Simulation,  Estimation  and  Assimilation  System” 
(MSEAS)  (Web-MSEAS,  2009),  includes  the  primitive  equation  code  of  the  Harvard 
Ocean  Prediction  System  (HOPS)  and  other  computational  systems:  a  nested  data- 
assimilative  barotropic  tidal  prediction  system  (Logutov  and  Lemusiaux,  2008),  a 
coastal  objective  analysis  scheme,  the  Error  Subspace  Statistical  Estimation  (ESSE) 
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system  for  data  assimilation  (Lemusiaux,  1999),  optimization  (Heaney  et  al.,  2007) 
and  adaptive  sampling  (Lemusiaux,  2007),  novel  Objective  Analysis  schemes  (Agar- 
wal,  2009),  multiple  biological  models  (Besiktepe  et  ah,  2003)  and  several  acoustic 
models  (Robinson  and  Lermusiaux,  2003).  This  system  is  being  used  for  realistic  sim¬ 
ulations  and  real-time  forecasts  in  many  regions  of  the  world’s  ocean.  At  the  heart  of 
this  system  is  a  free-surface  hydrostatic  primitive  equation  model  with  new  two-way 
nesting  capabilities.  These  capabilities  have  been  used  in  real-time  experiments  since 
2001  to  improve  the  resolution  accuracy  in  selected  regions  with  minimal  modification 
and  run-time  expense. 

One  of  the  goals  of  the  MSEAS  group  is  to  utilize  and  develop  new  numerical 
methods  for  ocean  predictions.  In  the  past  decade,  new  numerical  algorithms  have 
been  developed,  not  only  for  computational  fluid  dynamics,  but  also  for  chemical 
and  biological  dynamics.  It  is  now  possible  to  research  the  next  generation  of  ocean 
prediction  models  that  build  upon  progress  made  in  these  other  research  helds,  lead¬ 
ing  to  a  better  understanding  of  interdisciplinary  ocean  dynamics.  Ocean  specihc 
numerical  research  includes:  fully  coupled  physical,  biological  and  acoustic  modeling; 
multi-scale  models;  unstructured  spatial  grids;  distributed  ocean  modeling;  embedded 
models;  high-order  schemes;  as  well  as  self-modifying  models  that  adapt  to  data  and 
learn  proper  parameterizations  and  parameters. 

The  purpose  of  this  thesis  is  to  identify  promising  numerical  methods  that  are 
suitable  to  ocean  predictions.  In  order  to  fulhll  this  purpose,  current  efforts  towards 
creating  new  ocean  models  are  reviewed,  an  understanding  of  the  most  promising 
methods  used  by  other  researchers  is  developed,  new  methods  are  investigated  and 
demonstrated  by  solving  a  test  problem,  and  important  numerical  issues  related  to 
efficiency  are  examined.  The  Discontinuous  Galerkin  (DG)  Finite  Element  Method 
(FEM)  is  identihed  as  a  promising  new  numerical  scheme  for  ocean  simulations.  The 
DG  FEM  is  used  to  solve  biogeochemical  advection-diffusion-reaction  equations  on  a 
two-dimensional  idealized  domain,  and  p-adaptivity  across  constituents  is  examined. 
Finally,  the  efficient  inversion  of  the  linear  discrete  operator  using  iterative  solvers  is 
explored.  This  thesis  develops  the  tools  necessary  to  solve  the  dynamical  equations 
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for  ocean  predictions. 


1.1  Thesis  Organization 

Chapter  2  reviews  the  work  done  by  numerous  groups  towards  developing  the 
second  generation  of  ocean  models.  The  model  developed  by  each  group  is  briefly 
summarized,  and  all  the  models  are  compared  and  grouped.  Chapter  3  describes  the 
Discontinuous  Galerkin  (DC)  method  for  solving  advection-diffusion  problems,  detail¬ 
ing  the  discrete  formulation  and  discussing  implementation  issues.  The  new  Hybrid 
Discontinuous  Galerkin  method  for  solving  higher  order  derivatives  is  also  briefly  dis¬ 
cussed.  Chapter  4  demonstrates  the  solution  of  biogeochemical  reaction  equations 
on  two-dimensional  unstructured  grids  using  DC.  Each  aspect  of  the  numerical  so¬ 
lution  is  examined  separately,  and  p-adaptive  strategies  are  examined.  Chapter  5 
benchmarks  numerous  solver-preconditioner  combinations  to  identify  an  efficient  so¬ 
lution  method  for  inverting  matrices,  which  is  necessary  for  implicit  time  integration 
schemes.  Finally,  Chapter  6  summarizes  the  conclusions  and  makes  recommendations 
on  how  to  proceed. 
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Chapter  2 


Review  of  Ocean  Models 

2.1  Introduction 

The  first  generation  of  ocean  modelling  systems  are  based  on  the  seminal  article 
by  Bryan  (1969).  In  this  article  a  hydrostatic,  rigid  lid  model  is  proposed  with 
an  energy  conserving  numerical  scheme.  While  modern  ocean  models  have  become 
sophisticated  modelling  systems  with  complex  data  assimilation  schemes,  adaptive 
modelling  capabilities,  free  surface  [and]  open  boundary  conditions,  the  numerical 
schemes  used  for  these  models  are  still  largely  based  on  the  original  computational 
fluid  dynamics  technology  of  the  late  sixties,  that  is  low-order  hnite  difference  and 
hnite  volume  schemes  on  structured  grids.  For  a  review  of  the  first  generation  of 
ocean  models,  the  reader  is  referred  to  Griffies  et  al.  (2000). 

Recent  advances  in  numerical  schemes  include  hnite  volume  and  hnite  elements 
methods  on  unstructured  grids.  While  some  ocean  models  have  used  the  hnite  vol¬ 
ume  methods  (Marshall  et  ah,  1997a,b,  1998),  all  of  the  hrst  generation  modelling 
systems  are  based  on  low  order  schemes  on  structured  grids.  The  vertical  discretiza¬ 
tion  has  garnered  signihcant  attention,  resulting  in  a  number  of  terrain  following 
coordinate  schemes  (Freeman  et  ah,  1972),  isopycnal  vertical  coordinates  (Bleck  and 
Smith,  1990),  z-coordinates  (Bryan,  1969),  and  hybrid  schemes  (Spall  and  Robinson, 
1990,  Pietrzak  et  ah,  2002).  Also,  curvilinear  structured  grids  have  been  used  in  the 
horizontal  (Adcroft  et  ah,  2004).  Nonetheless,  it  has  been  recognized  by  a  number 
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of  different  modelling  groups  that  new  Computational  Fluid  Dynamics  (CFD)  tech¬ 
nologies  are  suitable  to  be  used  for  the  second  generation  of  ocean  models.  The  most 
prominent  second  generation  models  are  summarized  in  Table  2.1,  where  “second  gen¬ 
eration”  is  [for  now]  interpreted  as  those  models  that  use  unstructured  grids.  Refer 
to  Table  A.l  for  an  additional  summary  with  more  details. 

In  the  following  sections,  each  of  these  models  are  described  individually  to  high¬ 
light  the  different  modelling  ideologies,  features,  and  numerical  methods.  For  a  gen¬ 
eral  review  of  modelling  efforts,  the  reader  is  referred  to  Pain  et  al.  (2005)  and  Slingo 
et  al.  (2009). 

2.2  ADCIRC 


ADCIRC  is  a  FEM  model  developed  for  coastal  oceans,  shelves,  estuaries,  inlets, 
floodplains,  rivers  and  beaches.  The  development  team  consists  of  R.  Luettich  (UNC- 
CH),  J.  Westerink  (ND)  R.  Kolar  (OU),  C.  Dawson  (UT),  S.  Bunya  (U-Tokyo),  and 
E.  Kubatko  (OSU). 

The  model  is  actively  being  developed  with  current  efforts  towards  upgrading 
the  computational  engine  from  a  CG  FEM  based  solution  to  a  new  h-p  adaptive  DG 
FEM  based  algorithm.  The  model  can  solve  the  following  equations:  two-dimensional 
Shallow  Water  Equations  (SWE);  three-dimensional  mass  and  momentum  conserva¬ 
tion  subject  to  incompressibility,  hydrostatic  and  Boussinesq  approximations;  two- 
dimensional  sediment  continuity  equation;  two-dimensional  and  three-dimensional 
temperature  and  salinity  transport  equations. 

Some  features  of  the  model  include: 

•  full  wetting/drying  elements  in  two  and  three  dimensions; 

•  barrier  elements  (such  as  levees); 

•  Gonduits  and  porous  barriers;  at  least  second  order  accurate  numerical  schemes; 

•  implicit  or  explicit  time-stepping  schemes; 

•  highly  scalable  parallel  Message  Passing  Interface  (MPI)  implementation  (up  to 
lOOO’s  of  processors).  This  system  is  written  in  FORTRAN  90. 


Model  Name 

Details 

ADCIRC 

ADvanced  CIRCulation  model 

FEM  (CG  or  DG).  Designed  for  coastal  oceans, 
shelves,  estuaries,  inlets,  floodplains,  rivers  and 
beaches 

Delfin 

FV/FD 

ELCIRC 

Eularian-Lagrangian  CIRCu¬ 
lation  model 

FV /FD  Eulerian-Lagrangian  using  prisms/quads.  De¬ 
veloped  for  Golumbia  River,  also  used  for  simulation  of 
3D  baroclinic  circulation  across  river-to-ocean  scales, 
and  for  estuaries  and  continental  shelves. 

FEOM 

Finite  Element  Oeean  Model 

FEM  using  prisms.  General-purpose  general  circula¬ 
tion  model  solving  primitive  equations  under  Boussi- 
nesq  approximation 

Einel 

FEM  using  tetrahedrals.  Solves  3D  non-hydrostatic 
equations. 

EVCOM 

Finite  Volume  Coastal  Oeean 
Model 

FV  using  prisms.  Developed  for  estuarine  flood¬ 
ing/drying  process  in  estuaries  and  the  tidal- 
, buoyancy-  and  wind-driven  circulation  in  coastal  re¬ 
gions  featured  with  complex  irregular  geometry  and 
steep  bottom  topography. 

ICOM 

Imperial  College  Oeean  Model 

FEM  (GG  and  DG)  using  tetrahedrals.  Developed  as 
general  model  useful  for  all  ocean  regimes. 

RiCOM 

River  and  Coastal  Oeean 
Model 

FEM.  Used  to  provide  storm  surge  forecasts.  Empha¬ 
sis  on  coastal  oceans. 

SELFE 

Semi- Eularian-Lagrangian 
Finite  Element  oeean  model 

FEM  using  prisms.  Developed  for  Golumbia  River, 
also  used  for  simulation  of  3D  baroclinic  circulation 
across  river-to-ocean  scales. 

SEOM 

Speetral  Element  Oeean  Model 

Spectral  Methods  (SM).  Solved  2D  SWE,  and  solving 
primitive  3D  Boussinesq  equations  in  development. 

SLIM 

Seeond- generation  Louvain- 
la-Neuve  lee- oeean  Model 

FEM  (DG  or  GG).  Focus  on  global  climate  evolution. 

SUNTANS 

Standford  Unstruetured  Non- 
hydrostatie  Terrain-following 
Adaptive  Navier-Stokes  Simu¬ 
lator 

FV  using  prisms.  Developed  for  coastal  ocean  simula¬ 
tions. 

UnTRIM  Unstruetured  Tidal 
Residual  Inter-tidal  Mudflat 
model 

FV/FD  using  prisms /quadrilaterals.  Developed  for 
rivers,  lakes,  and  coastal  oceans. 

Table  2.1:  Summary  of  second  generation  ocean  models 
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For  recent  articles  related  to  the  development  of  ADCIRC,  refer  to  Dawson  and 
Proft  (2002),  Bunya  et  al.  (2005),  Kubatko  et  al.  (2006),  and  Forbes  et  al.  (2007). 
Also  the  development  site  is  located  at  Web-ADCIRC  (2006). 

This  modelling  system  has  been  used  for  a  number  of  applications  including  mod¬ 
elling  tides  (Westerink  et  ah,  1994,  Blanton  et  ah,  2004,  Jarosz  et  ah,  2005),  hurricane 
storm  surges  (Blain  et  ah,  Gica  et  ah,  2001),  flooding  (Luettich  and  Westerink,  1995, 
Feyen  et  ah,  2006),  and  wind  driven  circulation  and  transport  (Luettich  et  ah,  1999). 
It  has  also  been  used  extensively  for  storm  surge  simulations  in  New  Orleans  (West¬ 
erink  et  ah,  2007).  Finally,  it  is  used  by  the  U.S.  Army  Corps  of  Engineers  and  the 
U.S.  Navy,  is  certihed  by  FEMA  for  the  National  Flood  Insurance  Program,  and  is 
used  by  NOAA’s  National  Ocean  Services  for  storm  surge/inundation  applications. 


2.3  Delfin  and  Finel 

Delhn  was  developed  by  D.  Ham  under  the  supervision  of  J.  Pietrzak  and  Gnus 
Stelling  at  Delft  University  of  Technology  (Ham,  2006).  This  model  is  a  three- 
dimensional  hnite-volume/hnite-difference  model  using  an  unstructured  mesh.  The 
group  for  which  this  model  was  developed  is  currently  studying  the  Indian  Ocean 
Tsunami.  D.  Ham  is  currently  working  with  the  ICOM  group.  This  code  is  written 
in  C. 

Finel  was  also  developed  by  the  same  group,  and  it  is  a  three-dimensional  non¬ 
hydrostatic  hnite  element  model  bases  on  a  tetrahedral  mesh  (that  is,  unstructured 
in  all  three  dimensions).  This  code  is  being  developed  by  R.  J.  Labeur  at  TU  Delft. 
The  website  group  website  is  located  at  Web-DELFT  (2009). 


2.4  ELCIRC  and  SELFE 

ELCIRC  and  SELFE  were  originally  developed  for  applications  surrounding  the 
Columbia  river  estuary.  The  CORIE  modeling  system,  a  coastal  margin  observatory 
for  the  Columbia  River  estuary  and  plume,  uses  SELFE  and  ELCIRC  by  default,  but 
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have  previously  used  other  models  such  as  POM,  ADCIRC,  and  QUODDY.  SELFE 
is  the  newest  model  and  has  a  number  of  improvements  over  ELCIRC,  overcoming 
restrictions  in  the  discretization.  SELFE  uses  the  FEM,  while  ELCIRC  uses  FV  in 
the  horizontal  and  ED  in  the  vertical.  The  gronp  is  based  ont  of  the  OGI  School  of 
Science  and  Engineering,  and  the  development  team  consists  of  A.  Baptista  (Scientihc 
director),  J.  Zhang,  M.  G.  G.  Foreman,  D.  Stucchi,  E.P.  Myers,  A.  Oliveira,  and  A.B. 
Fortnnato.  The  model  is  actively  being  developed,  with  cnrrent  efforts  towards  solving 
non-hydrostatic  equations. 

The  cnrrent  model  solves  the  3D  shallow-water  eqnations,  with  hydrostatic  and 
Bonssinesq  approximations,  and  transport  equations  for  salinity  and  heat.  The  pri¬ 
mary  variables  that  SELFE  solves  for  are:  the  free-snrface  elevation;  3D  velocity;  3D 
salinity;  and  3D  temperatnre  of  the  water.  The  nnmerical  formulation  is  not  explicitly 
mass-conserving,  but  the  mass-conservation  properties  are  “very  good.”  Neither  EL¬ 
CIRC  nor  SELFE  use  a  mode-splitting  scheme,  nor  do  they  use  a  projection  method, 
that  is,  the  velocity  and  snrface  elevation  are  solved  simultaneously  (Zhang  et  ah, 
2004,  Baptista  and  Zhang,  2008).  SELFE  and  ELCIRC  use  quadrilateral  or  pris¬ 
matic  elements,  allowing  great  flexibility  in  the  choice  for  vertical  discretization. 

Some  features  of  the  model  inclnde: 

•  wetting  and  drying; 

•  Z,  S,  or  mixed  S-Z  coordinates  for  vertical  discretization; 

•  ECO-SELFE  (biological  model); 

•  Semi-implicit  time  integration; 

•  Both  parallel  (MPI)  and  serial  version  of  code. 

This  system  is  written  using  a  combination  of  FORTRAN  and  MATLAB. 

A  recent  description  of  SELFE  can  be  found  in  Baptista  and  Zhang  (2008),  and  a 
description  of  ELCIRC  can  be  found  in  Zhang  et  al.  (2004).  Also,  the  group  website 
is  located  at  Web-CORRIE  (2009). 

ELCIRC  has  been  applied  to  the  Colnmbia  River  (Baptista  et  ah,  2005),  to  the 
St.  John’s  river  (Myers  and  Aikman,  2003),  to  stndy  marine  ecosystem  connectivity 
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(Robinson  et  al.,  2005),  and  to  stratification  (Pinto  et  al.,  2003)  and  tidal  (Foreman 
et  al.,  2006)  studies  in  estuaries.  SELFE  has  been  tested  extensively  against  standard 
ocean/coastal  benchmarks  and  used  in  a  number  of  bays/estuaries  around  the  world 
(Baptista  and  Zhang,  2008).  SELFE  will  be  used  for  the  same  applications  as  ELCIRC 
in  the  future. 


2.5  FEOM 

FEOM  is  being  developed  under  the  Community  Ocean  Model  (COM)  project 
undertaken  by  the  Alfred  Wegener  Institute  (AWI)  located  in  Bremerhaven,  Germany. 
The  goal  of  the  COM  project  is  to  develop  a  general-purpose  ocean  model  based  on 
unstructured  meshes  that  contains  standard  ocean  modelling  tools,  such  as  different 
advection  schemes,  mixed  layer  parameterizations,  free  surface  boundary  conditions, 
and  generalized  vertical  coordinates.  FEOM  uses  the  FEM.  This  is  an  open  source 
project,  with  a  number  of  approved  developers,  but  the  main  contacts  are  S.  Danilov, 
L.  Nerger,  and  J.  Schroter. 

This  model  is  actively  being  developed  with  an  emphasis  on  making  this  un¬ 
structured  grid  model  as  efficient  as  a  structured  grid  model.  FEOM  solves  the  3D 
primitive  equations  under  the  Boussinesq  approximation.  It  uses  prismatic  elements, 
allowing  for  a  generalized  vertical  discretization.  An  earlier,  less-efficient  version  of 
the  code  used  prismatic  elements. 

Some  of  the  model  features  include: 

•  Free  surface; 

•  Non-hydrostatic; 

•  Sea-ice  model; 

•  NPDZ  biological  model; 

•  Semi-implicit  time  stepping; 

•  Parallel  MPI  implementation. 
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The  formulation  of  FEOM  is  described  in  Danilov  et  al.  (2004),  and  effects  of  vertical 
discretization  are  discussed  in  Wang  et  al.  (2008).  The  AWI  website  is  located  at 
Web-AWI  (2009)  and  the  FEOM  project  page  is  located  at  Web-FEOM  (2009). 

The  model  has  been  used  for  studying  circulation  and  bottom  pressure  in  the 
Atlantic  (Boning  et  ah,  2006),  assimilation  of  sea-surface  height  data  from  the  TAN- 
DAM  project  (Nerger  et  ah,  2006),  and  studying  the  influence  of  tidal  forcing  and 
topography  representation  in  the  Weddel  Sea  (Wang  et  ah,  2009). 

2.6  FVCOM 

FVCOM  was  originally  developed  for  the  estuarine  wetting/drying  process  in  es¬ 
tuaries  and  the  tidal-,  buoyancy-  and  wind-driven  circulation  in  coastal  regions  with 
complex  irregular  geometry  and  steep  bottom  topography.  FVCOM  uses  the  FV 
method.  The  FVCOM  group  is  based  at  the  University  of  Massachusetts-Dartmouth, 
and  the  main  development  team  consists  of  C.  Chen  (UMass),  G.  Cowles  (UMass), 
and  R.  C.  Beardsley  (WHOI). 

This  model  is  mature,  but  is  still  actively  being  developed,  with  current  focus  on 
the  non-hydrostatic  solver.  The  hydrostatic  model  solves  the  3D  primitive  equations 
with  a  mode-splitting  scheme.  This  model  uses  prismatic  elements,  with  the  ver¬ 
tical  discretization  employing  terrain-following  coordinates.  FVCOM  is  a  complete 
modelling  system,  with  some  of  the  model  features  including: 

•  Free  surface; 

•  Non-hydrostatic; 

•  wetting/drying  elements; 

•  Biological  models; 

•  Fully  non-linear  ice  models; 

•  Wave  model; 

•  Semi-implicit  time  stepping; 

•  Parallel  MPI  implementation. 
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The  system  is  written  in  FORTRAN  90. 

The  formulations  for  FVCOM  is  described  in  Chen  et  ah  (2003),  with  a  comparison 
between  structured  and  unstructured  grids  in  Chen  et  ah  (2007).  The  group  website 
is  located  at  Web-FVCOM  (2009). 

FVCOM  has  seen  a  number  of  applications  in  Bays,  (Zhao  et  ah,  2006,  Chen 
et  ah,  2008),  estuaries  (Xue  et  ah,  2009),  lakes  (Chen  et  ah,  2003),  seas  (Chen  et  ah, 
2003)  and  other  regimes.  A  complete  listing  of  current  projects  can  be  found  at 
Web-FVCOM  (2009). 

2.7  ICOM 

ICOM  is  being  developed  for  use  as  a  general  ocean  circulation  model.  The  model 
uses  sophisticated  anisotropic  mesh  adaptivity  in  three  dimensions.  The  ICOM  group 
is  based  out  of  the  Imperial  College  in  London,  but  collaborates  with  Oxford,  the 
National  Oceanography  Center  in  Southampton,  and  the  Proudman  Oceanographic 
Laboratory  in  Liverpool.  ICOM  uses  either  the  CG  FEM  or  the  DC  FEM,  with 
research  toward  determining  the  best  type  of  hnite  element  to  use  for  ocean  appli¬ 
cations.  Some  of  the  developers  responsible  for  developing  the  model  are  C.  C.  Pain 
(Project  head),  D.  A.  Ham,  M.  D.  Piggot,  C.  J.  Cotter,  A.  J.  H  Goddard,  C.  R.  E. 
De  Oliveira,  and  A.  P.  Umpleby. 

A  large  team  is  actively  developing  this  model.  ICOM  uses  tetrahedral  elements 
with  anisotropic  adaptivity  in  all  three  dimensions.  The  model  solves  the  three- 
dimensional  non-hydrostatic  Boussinesq  equations.  A  projection  method  is  used  to 
enforce  the  continuity  constraint.  Some  of  the  model  features  include: 

•  Sophisticated  dynamic  anisotropic  mesh  adaptivity; 

•  Free-surface; 

•  Non-hydrostatic; 

•  NPDZ  biology  model; 

•  wetting  and  drying; 

•  Implicit  time  stepping; 


34 


Sophisticated  load-balanced  domain  decomposition  parallel  implementation. 


The  model  is  described  in  Ford  et  ah  (2004a)  and  Piggott  et  ah  (2008),  and 
validated  in  Ford  et  ah  (2004b).  Mesh  adaptivity  is  discussed  in  Piggott  et  ah  (2005) 
with  the  optimization  metric  discussed  in  Power  et  ah  (2006).  The  development 
website  is  located  at  Web-ICOM  (2008). 

This  model  is  still  in  the  development  phase  and  has  not  seen  much  realistic  ap¬ 
plication.  It  promises  to  be  useful  for  modelling  Western  boundary  currents,  flow 
over  topography,  open  ocean  deep  convection,  gravity  currents,  internal  wave  break¬ 
ing,  salt  hngering,  tidal  modelling,  tsunami  modelling.  North  Atlantic  thermohaline 
circulation,  and  wetting  and  drying. 


2.8  RiCOM 

RiCOM  was  developed  by  R.  A.  Walters,  and  is  used  to  provide  storm  surge 
forecasts  for  New  Zealand.  It  solves  the  three-dimensional  primitive  equation  hydro- 
dynamic  model  with  semi-implicit  time  stepping  and  a  semi-Lagrangian  advection 
scheme.  It  uses  the  CG  FEM  on  triangular  and  quadrilateral  elements.  It  also  has  a 
non-hydrostatic  pressure  option.  This  model  is  embedded  into  a  New  Zealand  fore¬ 
casting  system  that  includes  a  Local  Area  Weather  model,  a  sea  surface  height  model, 
and  a  wave  model. 

The  model  formulation  is  described  in  Walters  (2005b)  and  validation  for  storm 
surge  forecasting  is  discussed  in  Lane  and  Walters  (2009).  The  RiCOM  model  has 
results  related  to  ocean  modelling,  from  the  type  of  FEM  to  use  (Walters,  2005a, 
Walters  and  Barragy,  1997),  to  solution  methods  (Barragy  and  Walters,  1998,  Walters 
et  ah,  2007)  to  model  design  considerations  (Walters,  2006). 
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2.9  SEOM 


SEOM  is  being  developed  for  large  scale  ocean  applications.  The  eventual  goal 
is  to  solve  the  three-dimensional  non-hydrostatic  primitive  equations,  but  SEOM 
currently  has  a  robust  solver  for  the  two-dimensional,  depth  integrated  shallow  water 
equations.  SEOM3D  solves  the  primitive  hydrostatic  and  Boussinesq  Navier  Stokes 
equations  in  three  dimensions.  This  model  uses  high-order  Spectral  Element  Method 
(SEM)  on  unstructured  rectangular  meshes  in  the  horizontal  and  sigma  coordinates  in 
the  vertical.  SEOM  is  developed  by  M.  Iskandarani  (Project  head),  D.  B.  Haidvogel, 
J.  C.  Levin,  and  J.  P.  Boyd. 

Some  of  the  model  features  include: 

•  h-p  rehnement; 

•  Free-surface; 

•  Non-hydrostatic; 

•  Semi-implicit  time  stepping; 

•  MPI  parallel  implementation. 

SEOM  was  originally  written  in  C,  but  was  later  re-coded  in  FORTRAN  90. 

The  formulation  of  the  two-dimensional  SEOM  is  described  in  Iskandarani  et  ah 
(1995),  and  the  three-dimensional  formulation  is  described  in  Iskandarani  et  al.  (2003). 
The  SEOM  development  website  is  located  at:  Web-SEOM  (2009). 

The  two-dimensional  version  of  SEOM  has  seen  a  number  of  applications,  includ¬ 
ing  an  investigation  of  the  wind-driven  circulation  in  the  Mediterranean  sea  (Molcard 
et  ah,  2002)  and  an  investigation  of  the  dynamics  of  the  long  period  tides  in  the  global 
ocean  (Wunsch  et  ah,  1997). 

2.10  SLIM 

SLIM  is  developed  as  an  ice-ocean  model  for  simulating  global  climate,  and  large- 
scale  ocean  applications.  SLIM  is  based  out  of  the  UCL,  with  partners  from  all 
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around  the  world.  SLIM  uses  both  the  CG  FEM  and  the  DG  FEM  with  research 
aimed  towards  determining  the  best  type  of  hnite  element  for  ocean  applications. 
SLIM  is  being  developed  by  a  large  team,  some  members  including  E.  Deleersnijder, 
T.  Fichefet,  V.  Legat,  J.-F.  Remade,  E.  Hanert,  G.  Konig  Beatty,  L.  White  (UGL),  J.- 
M.  Beckers  (ULG),  V.  Dehant  (Royal  Observatory  of  Belgium),  O.  de  Viron  (IPGP), 
E.  Delhez  (ULG),  E.  Hanert  from  the  (UoR,  UK),  D.  Le  Roux  (ULaval),  and  E. 
Wolanski  (AIMS). 

The  model  is  actively  being  developed  with  current  efforts  focused  on  the  three- 
dimensional  implementation.  SLIM  uses  horizontally  adaptive  unstructured  prismatic 
meshes.  The  governing  equations  currently  solved  for  include  the  depth-integrated 
shallow-water  equations  and  the  three-dimensional  hydrostatic  primitive  equations. 
The  SLIM  group  has  generated  a  number  of  high-quality  unstructured  meshes  for 
various  ocean  regimes.  Some  of  the  current  model  features  include: 

•  Adaptive  horizontal  mesh; 

•  Sea-ice  model; 

•  State  of  art  sub-grid-scale  parameterizations; 

•  Semi-implicit  time  stepping; 

•  Parallel. 


The  SLIM  project  has  computational  results  in  tracer  advection  (White,  2008),  mesh 
adaptation  (Remade  et  ah,  2005,  2006),  mesh  generation  (Legrand  et  ah,  2007), 
and  dispersion  analysis  (Bernard  et  ah,  2008,  Le  Roux,  2005).  The  two-dimensional 
formulation  is  outlined  in  Le  Roux  et  ah  (2000)  and  the  three-dimensional  formulation 
is  outlined  in  White  et  al.  (2008).  The  group  development  website  is  located  at  Web- 
SLIM  (2009). 

Since  the  model  is  still  under  development,  it  has  not  seen  a  large  number  of  real¬ 
istic  applications.  Once  completed,  it  should  be  useful  as  a  general  ocean  circulation 
model  with  application  to  multi-scale  ocean  processes. 
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2.11  SUNTANS 


SUNTANS  is  an  unstructured  grid,  non- hydrostatic,  finite  volume  coastal  ocean 
simulator.  The  development  team  is  based  out  of  Stanford,  but  includes  a  number  of 
researchers  from  around  the  world.  The  core  development  team  at  Stanford  consists 
of  A.  Boehm,  D.  Fong,  O.  Fringer,  M.  Gerritsen,  E.  Gross,  J.  Koseff,  S.  Monismith, 
R.  Naylor,  R.  Street,  and  S.  Sankaranarayanan.  The  model  is  mature,  and  one  of 
the  current  development  projects  is  nesting  this  unstructured  grid  model  within  the 
existing  structured  grid  Rutgers  Regional  Ocean  Model  System  (ROMS)  (Fringer 
et  ah,  2006a).  SUNTANS  uses  prismatic  meshes  and  a  stair-case  representation  of 
the  bottom  topography.  The  stair-case  representation  is  combined  with  the  Immerse 
Boundary  Method  (IBM)  to  properly  resolve  bottom  topography.  The  model  solves 
the  Navier  Stokes  equations  under  the  Boussinesq  assumption,  and  uses  an  LES 
turbulence  closure  model.  Some  of  the  model  features  include: 

•  Non-hydrostatic; 

•  Large  eddy  simulation  for  resolved  features; 

•  Z-level  coordinates  combined  with  the  immersed  boundary  method  for  accurate 
topography  on  bottom; 

•  wetting  and  drying; 

•  Semi-implicit  time  stepping  using  the  Theta  method; 

•  Parallel  (MPI)  implementation. 


The  formulation  for  SUNTANS  is  described  in  Fringer  et  ah  (2006b),  and  the  model 
is  based  on  the  method  by  Gasulli  (1999).  The  group  development  site  is  located  at 
Web-SUNTANS  (2009). 

The  majority  of  SUNTANS  applications  have  been  related  to  internal  tides  (Jachec 
et  ah,  2006,  2007,  Venayagamoorthy  and  Fringer,  2005),  but  the  code  could  also  be 
used  for  applications  in  estuaries  and  coastal  oceans. 
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2.12  UnTRIM 


UnTRIM  is  an  unstructured  orthogonal  grid  finite  volume  or  finite  difference 
model  using  prismatic  or  quadrilateral  elements.  It  was  developed  for  estuaries, 
lakes,  and  coastal  oceans.  It  was  developed  by  V.  Casulli  from  Trento  University, 
Italy.  The  model  solves  the  three-dimensional  shallow  water  equations,  as  well  as 
three-dimensional  transport  equations  for  salt,  heat,  dissolved  matter,  and  suspended 
sediments.  Some  of  the  model  features  include: 

•  Non-hydrostatic; 

•  Free  surface; 

•  Semi-implicit  time  stepping; 

•  Parallel  (MPI)  implementation. 

Publications  related  to  the  development  of  UnTRIM  are:  Casulli  (1999),  Casulli 
and  Walters  (2000),  Casulli  and  Zanolli  (2002,  2005),  and  the  code  can  be  obtained 
from:  Web-UNTRIM  (2009). 

UnTRIM  has  a  large  user  base,  and  has  been  applied  to  storm  surge  predictions 
in  bays  (Shen  et  ah,  2006a,b),  and  rivers  (Liu  et  ah,  2008). 

2.13  Discussion  and  Conclusions 

In  order  to  compare  and  contrast  the  different  models.  Figure  2-1  was  created. 
There  are  two  main  groups  of  models,  the  FEM  and  FV  models.  In  general,  the  FV 
models  are  more  mature,  likely  because  the  technology  is  older.  In  the  FV  group, 
FVCOM  and  SUNTANS  have  greater  flexibility  in  their  meshes,  since  UnTRIM  re¬ 
quires  orthogonal  unstructured  meshes.  Also,  FVCOM  is  a  mature  modelling  system, 
not  just  a  solver  for  dynamics.  UnTRIM  also  has  a  large  user  base.  But  SUNTANS 
and  FVCOM  are  expected  to  see  increased  usage  in  the  future. 

In  general,  the  FEM  models  are  less  mature,  highlighted  by  the  fact  that  most 
do  not  have  non-hydrostatic  solver  options.  The  ADCIRC,  SELFE/ELCIRC,  and 
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FEOM  models  have  been  used  for  realistic,  albeit  somewhat  specialized,  applica¬ 
tions.  RiCOM  and  DELFIN/FINLAB  have  fewer  developers,  and  the  models  are  less 
sophisticated  than  the  top  row.  SEOM  is  still  under  development,  although  it  has 
seen  more  realistic  usage  than  either  ICOM  or  SLIM.  ICOM  and  SLIM  are  the  most 
sophisticated  models  reviewed,  with  their  major  advantage  being  adaptive  meshing, 
although  this  does  lead  to  an  increase  in  development  time  due  to  the  additional 
complexity. 

Most  of  the  second  generation  models  reviewed  use  the  FEM  with  some  form  of 
non-conforming  or  discontinuous  element.  The  FEM  offers  a  number  of  advantages 
over  the  FV  method.  Specihcally,  the  FEM  variational  framework  allows  closed  from 
proofs  about  the  numerical  schemes  in  terms  of  consistency  and  stability.  Also,  higher 
order  schemes  are  more  easily  formulated,  providing  a  flexible  code  capable  of  arbi¬ 
trarily  high  order  schemes.  The  EEM  can  also  be  generalized  for  arbitrarily  shaped 
elements,  allowing  a  single  implementation  capable  of  having  mixed  elements  within 
the  same  mesh.  The  FEM  is  more  general  than  the  FV  method  allowing  greater 
flexibility  when  developing  new  schemes.  In  fact,  FV  methods  can  be  cast  in  terms  of 
the  FEM.  Among  the  disadvantages  of  traditional  EEMs  are  increased  complexity  in 
the  implementation,  and  CG  FEMs  have  difficulty  stabilizing  advection-dominated 
flows,  leading  to  complicated  stabilization  schemes.  Newer  DG  schemes  do  not  have 
difficulty  stabilizing  advection-dominated  flows,  but  they  suffer  from  poorer  compu¬ 
tational  efficiency  and  complications  with  second  order  or  higher  derivatives.  Despite 
the  disadvantages,  most  of  the  second  generation  model  developers  chose  to  use  the 
FEM  with  a  non-conforming  or  discontinuous  discretization. 

Because  the  DG  schemes  are  newer  with  less  established  practices,  and  because 
they  offer  exciting  new  possibilities  for  solving  advection-dominated  flows,  it  was 
decided  to  investigate  these  schemes  further.  The  next  section  provides  and  overview 
of  DG  methods. 
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Figure  2-1:  Second  generation  unstructured  grid  ocean  modelling  systems 
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Chapter  3 


Discontinuous  Galerkin  (DG) 
Methods 


It  is  assumed  that  the  reader  is  familiar  with  FD  and  FV  methods,  but  that  a 
brief  review  of  the  FEM  is  in  order. 

First  a  consistent  notation  is  introduced  that  will  be  used  throughout  this  docu¬ 
ment.  Referring  to  Figure  3-1,  the  problem  domain  is  specihed  as  hi,  and  its  boundary 
as  dQ.  If  a  boundary  has  a  specihed  type  “D”,  that  boundary  will  be  indicated  as 
dQn.  The  discretized  triangulation  is  represented  by  T^.  Individual  elements  within 
%  are  represented  with  K^,  where  the  subscript  is  used  to  refer  to  a  specihc  triangle 
in  the  triangulation  or  omitted  when  referring  a  general  triangle.  The  boundary  of 
an  element  Ki  is  indicated  by  dKi.  Thus  we  can  say  Th  =  [jKi.  We  also  dehne 
the  set  containing  all  the  edges  in  the  domain  Eh  =  [JdKi,  the  set  containing  all 
domain-boundary  edges  ef  =  Eh  r\  dfl,  and  the  set  containing  all  domain-interior 
edges  El  =  Eh\el 
Consider: 


—  +  V  ■  F(u)  -  S{u)  =  0  (3.1) 

The  numerical  solution,  Uh,  to  this  equation  will  have  a  residual  R  =  ^  -f  V  ■ 
F(m/i)  —  S{uh).  In  FEM,  we  try  to  set  i?  =  0  over  specihed  weighting  (or  test) 
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Figure  3-1:  Notation  definition  for  domain 


functions  on  the  element,  and  this  is  known  as  the  Method  of  Weighted  Residuals 
(MWR)  (Chapra  and  Canale,  2006).  That  is,  we  set 

I  RwdQ  =  0  (3.2) 

Jn 

where  w  is  the  test  function.  If  w  and  the  numerical  solution  Uh  were  in  an  inhnite 
dimensional  space,  then  Uh  would  satisfy  the  equations  exactly.  However,  by  the  very 
nature  of  discrete,  numerical  solutions,  the  space  of  w  and  Uh  cannot  be  infinite. 
Their  numerical  solutions  but  reside  in  a  finite  space.  The  choice  of  space  will  make 
a  significant  difference  in  the  type  of  FEM  and  the  solution  method. 

There  are  a  number  of  standard  choices  for  choosing  the  test  function,  including: 

1.  Collocation 

2.  Subdomain 

3.  Galerkin 

With  the  Collocation  method,  w  =  (5(xj),  that  is,  the  test  functions  are  chosen  as 
delta  functions  at  discretely  chosen  points  Xj.  With  the  Subdomain  method,  w  =  C\k, 
that  is  w  is  chosen  as  a  constant,  C,  over  the  triangle  K.  With  the  Galerkin  method, 
w  is  chosen  to  be  the  same  as  the  basis  function,  9,  used  to  represent  Uh,  that  is 
w  =  6. 

To  discretize  the  solution  in  FEM,  a  finite  dimensional  basis  function  that  attempts 
to  represent  the  shape  of  the  true  solution  is  used.  This  basis  is  hnite  and  incomplete. 
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Nodal  Basis  Modal  Basis 


Figure  3-2:  Example  of  one-dimensional  quadratic  nodal  and  modal  bases 


that  is,  it  may  not  reproduce  the  true  solution.  Thus,  we  say  that  the  continuous 
true  function  is  expressed,  for  example,  as 


m(x,  t) 

K.  nfe(x,f)  =  y^Uh,i{t)9i{^) 

(3.3) 

Mfe(x,t) 

% 

Np 

A _ 1 

(3.4) 

Uh{yi,t) 

Z — ± 

Np 

=  ^Mh,*(t,Xj)0i(x) 

i=l 

(3.5) 

where  3.3  is  a  generic  representation  of  a  basis  6*i(x),  3.4  is  an  example  of  a  modal 
basis  function  where  the  unknown  coefficients  Uh,i{t)  are  a  function  of  time  and 
related  to  a  specific  mode,  and  3.5  is  an  example  of  a  nodal  basis  function  0i(x) 
where  the  unknown  coefficients  Xi)  are  a  function  of  time  and  related  to  a 

specihc  point  in  space  x*,  and  Np  is  the  number  of  points  or  modes.  Note  that  the 
notation  “(.)/i”  is  used  to  indicate  the  discretized  solution  which  is  dependent  on  the 
mesh  size  characterized  by  the  value  “h” .  A  nodal  basis  is  equal  to  one  at  a  particular 
node,  and  zero  on  all  other  nodes.  A  modal  basis  is  usually  non-zero  on  the  entire 
element,  but  is  related  to  a  specific  mode  or  polynomial  power.  An  example  of  a 
one-dimensional  quadratic  nodal  and  model  element  is  shown  in  Figure  3-2. 

As  an  example,  with  this  machinery  in  place,  the  discretization  of  ^wdVt  pro- 
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ceeds  using  3.3  as  follows: 


^wdn 

dt 


9  Ei  UhA 


E 


dt 
duh,i 
dt  , 


Whdfl 


OiWhdfl 


Choosing  Wh  in  the  Galerkin  sense,  that  is  Wh  =  dj.j  =  l...Np,  we  have 


[ 

In  dt 


which  can  be  written  as  a  matrix-vector  multiplication 


M 


dVLh 

dt 


where  Mjj  =  OiOjdVL  is  known  as  the  mass  matrix.  Note,  in  this  example,  9  could 
be  either  a  modal  or  nodal  basis,  and  can  be  dehned  in  any  appropriate  space.  The 
FEM  is  a  powerful  numerical  method  that  allows  flexibility  through  the  choice  of 
basis  and  test  functions.  In  particular,  FD  and  FV  schemes  can  be  recovered  using 
the  FEM.  The  FEM  also  allows  for  great  geometric  flexibility  since  the  formulation 
is  not  dependent  on  the  discretization,  enabling  the  use  of  unstructured  grids. 


3.1  Introduction  to  DG 

The  hrst  reported  use  of  DG  FEM  was  by  Reed  and  Hill  (1973)  where  DG  was  used 
to  solve  the  steady-state  neutron  transport  equation.  However,  DG  drew  little  atten¬ 
tion  until  a  series  of  papers  (Cockburn  and  Shu,  1989,  Cockburn  et  ah,  1989,  1990, 
Cockburn  and  Shu,  1998b),  where  the  Runge-Kutta  DG  methods  were  described. 
The  extension  of  DG  to  higher  order  derivatives  (Bassi  and  Rebay,  1997)  made  the 
method  applicable  to  solving  advection-diffusion  equations,  which  can  be  extended 
to  solving  the  Navier  Stokes  equations.  Since  the  late  90’s,  DG  has  seen  a  number  of 
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realistic  applications  in  aerospace,  solid  mechanics,  and  electromagnetism  to  name  a 
few. 

The  major  theoretical  difference  between  CG  and  DG  lies  in  the  approximation 
subspaces  used.  DG  uses  bases  that  are  in  normed  space  L^(D)  while  GG  uses  bases 
that  are  in  the  Hilbert  space  that  is,  the  function  has  to  be  continuous  across 

elements.  For  a  function  /(x)  to  be  in  it  has  to  satisfy  f{'x.)‘^dQ  <  oo, 

whereas  a  function  in  has  to  belong  to  a  smaller  space  satisfying  + 

V f(x)-V  f(x)d[}  <  oo.  Figure  3-3  illustrates  the  difference  between  a  one-dimensional 
DG  space,  and  a  one-dimensional  GG  space  (both  using  a  nodal  basis).  Notice  for  the 
DG  scheme  the  slope  is  undehned  across  the  element  boundary,  and  thus  the  solution 
cannot  reside  in  Also  note  that  in  the  example  shown,  the  GG  scheme  has 

four  degrees  of  freedom  while  the  DG  scheme  has  six  degrees  of  freedom  due  to  the 
doubling  of  information  at  element  boundaries. 


Discontinuous 


Continuous 


Figure  3-3:  Difference  between  solution  when  using  a  discontinuous  (left)  or  a  con¬ 
tinuous  (right)  basis 


The  duplication  of  unknowns  is  commonly  quoted  as  a  disadvantage  of  DG  com¬ 
pared  to  GG,  because  there  is  an  inherent  increase  in  computational  cost  associated 
with  a  larger  number  of  unknowns.  However,  proper  studies  comparing  the  error 
level  (Kubatko  et  al.,  2009)  suggest  that  this  disadvantage  may  not  be  as  dramatic 
as  stated  for  specihc  types  of  problems.  The  disadvantage  of  DG  over  FD  methods 
is  increased  development  time  as  well  as  decreased  computational  efficiency  per  de¬ 
gree  of  freedom.  Apart  from  the  efficiency  issues,  DG  has  a  number  of  advantageous 
properties  that  promote  its  use,  including: 
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•  Localized  memory  access  patterns.  The  local  nature  of  DG  elements  allows 
improved  scalability  for  parallel  architectures,  and  promise  to  take  better  ad¬ 
vantage  of  newer  Graphics  processing  units  (GPUs)  that  are  geared  towards 
massively  parallel  computations. 

•  Higher  order  accuracy.  Since  DG  belongs  to  the  FEM  variational  framework, 

the  same  interpolation  theory  applies.  That  is  convergence,  where  p  is 

the  order  of  the  basis  function  used,  can  be  obtained.  Obtaining  higher-order 
rates  of  convergence  for  FV  on  unstructured  grids  is  difficult,  and  requires  infor¬ 
mation  from  neighboring  elements.  Both  FD  and  FV  require  large,  non-compact 
stencils.  The  advantage  for  DG,  then,  is  obtaining  high-order  convergence  while 
maintaining  the  compact  stencil. 

•  Adaptive  strategies.  The  local  nature  of  DG  elements  allows  for  a  local  element 
interpolation  function  of  arbitrary  order  with  no  restrictions  imposed  by  neigh¬ 
boring  volumes.  That  is,  the  DG  framework  easily  allows  for  non-conforming 
discretization  which  facilitates  the  use  of  h  (adapting  the  triangulation)  and  p 
(adapting  the  order  of  the  basis)  adaptation  strategies. 

•  Designed  for  advection- dominated  flows.  Where  FV  schemes  struggle  to  achieve 
higher-order  accuracy  for  advection,  DG  along  with  an  appropriate  Riemann 
solver  easily  generalizes  to  use  arbitrarily  high-order  advection  schemes  for 
smooth  solutions. 

•  Superconvergence  properties  for  dispersion  and  dissipation.  DG  demonstrates 
superconvergence  for  the  dispersion  and  dissipation  of  waves  (Bernard  et  ah, 
2008),  making  it  well  suited  to  wave  propagation  problems. 

•  Complex  geometries.  Because  DG  hts  into  the  FEM  framework,  it  is  easily 
generalized  for  use  on  arbitrarily  shaped  elements,  making  it  suitable  for  use 
with  unstructured  grids  to  model  complex  geometries. 

By  using  DG,  then,  one  gains  a  great  deal  of  flexibility  in  terms  of  flux  stabiliza¬ 
tion  schemes,  geometry,  and  the  order  of  the  scheme  at  the  cost  of  arguably  greater 
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computational  expense  compared  to  CG,  FV,  and  FD. 


Figure  3-4:  Notation  for  plus  and  minus  triangular  elements 


Finally,  some  convenient  notation  to  mathematically  express  the  jumps  across 
elements  needs  to  be  introduced.  Often,  in  the  literature,  two  elements  bordering  an 
edge  are  labeled  and  K~ ,  with  associated  outward  pointing  normals  h’*'  and  h“ 
respectively,  as  shown  in  Figure  3-4.  (Alternatively,  sometimes  one  element  is  referred 
to  as  the  “left”  while  the  other  is  referred  to  as  the  “right”).  The  mean  values  {{.}} 
and  jumps  [.]  are  then  dehned  as  follows 

{{v}}  =  (v''"  -F  V)/2  {{w}}  =  +  w~)/2 

|v  ■  h]  =  ■  h"'"  -|-  v“  ■  h“  [ifh]  =  -|-  w~\i~ 

where  v  is  a  generic  vector  and  tc  is  a  generic  scalar.  Note  that  the  jump  of  a  vector 
is  a  scalar  while  the  jump  of  a  scalar  is  a  vector.  Furthermore,  note  that  the  jump 
will  be  zero  for  a  continuous  function.  Now  it  is  possible  to  discretize  an  equation 
using  DG,  as  follows  in  the  next  section. 
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3.2  DG  formulation  for  advection  problems 


Consider  the  advection  of  a  scalar  qnantity  u  with  flux  F(n)  and  source  term  S{u) 
satisfying 


+  V  •  F(n) 

=  S{u) 

,  in  (0,  T)  X  D 

(3.6) 

u 

=  9d, 

on  dflo 

(3.7) 

F{u)  ■  h 

=  dN, 

on  dflN 

(3.8) 

u 

=  Uo, 

in  (0,  0)  X  D 

(3.9) 

over  domain  hi  from  time  0  to  time  T,  where  (71)  is  the  value  of  the  Dirichlet  boundary 
on  OVLd  and  (7 at  is  the  value  of  the  Neumann  boundary  on  OVLn-  Let  V^{T)  denote  the 
set  of  polynomials  of  degree  at  most  p  on  a  domain  F.  Discretizing  the  domain  with 
triangulation  of  non-overlapping  elements  Ki  where  Nt  is  the  number 

of  triangles,  we  seek  an  approximation  Uh  of  u  with  Uh  €  Wfl  where 

Wl=  {we  L\n)  :  w  Ue  VP{K),yK  e  %}  (3.10) 


such  that: 


j^!^^w  +  [V-F{un)]w^dK  =  J^S{uh)wdK,  WKeTf,  (3.11) 

For  readers  unfamiliar  with  the  notation,  equation  3.10  reads:  take  w  to  he  in  the 
space  that  exists  on  D  such  that  w  restricted  to  an  element  K  lies  in  the  polynomial 
space  that  exists  on  K.  Equation  3.11  is  not  a  complete  DG  formulation,  since 
currently  the  solutions  on  individual  elements  are  not  coupled.  Following  an  approach 
similar  to  the  FV  method,  we  integrate  the  advection  term  by  parts 


du 


Ik  dt 

f 

Ik  dt 


wdK  +  /  V  ■  [F{uh)w]  dK-  F(ma)  ■  VwdK  =  /  S{uh)wdK 


'K 


'K 


'K 


wdK+  /  F{uh)-hwddK-  /  F{uh)-VwdK  =  /  S{uh)wdK  (3.12) 


IdK 


'K 


IK 
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where  the  second  step  follows  from  the  Divergence  theorem.  Note  that  the  notation 
F{uh)  indicates  that  the  solution  on  the  edge  bordering  two  elements  is  a  function 
of  both  the  bordering  elements,  thereby  achieving  a  coupling  between  elements.  In 
order  to  satisfy  conservation,  the  value  of  F{uh)  is  taken  as  constant  on  an  edge, 
that  is,  the  two  bordering  elements  will  use  the  same  value  of  Equation  3.12, 

then  gives  the  weak  formulation  of  3.6,  and  the  scheme  will  be  complete  as  soon  as 
the  functional  form  of  F(m/j)  is  specified.  Alternatively,  a  strong  formulation  for  the 
problem  can  be  found  by  taking  an  additional  integration  by  parts  in  equation  3.12 
as  follows 


duh 

dt 


wdK  + 


FK)  ■hwddK+  /  [V-F{uh)]wdK 


'K 


=  /  S{uh)wdK 


'K 


(3.13) 


where  the  second  application  of  the  Divergence  theorem  uses  F{uh)  instead  of  F{uh) 
to  obtain  a  unique  formulation  (otherwise  we  recover  3.11).  While  3.12  and  3.13  are 
mathematically  equivalent,  their  numerical  implementations  are  different,  and  there 
are  some  advantages  in  terms  of  implementation  and  efficiency  using  one  form  over 
the  other  for  some  problems.  Also,  after  a  re-arrangement  of  3.13 


'K 


duh 

dt 


+  [V  ■  F(m,)]  -  S{uh)  \wdK=  /  F{uh)  -  F(m,) 


'dK 


■  hwdOK 


it  is  highlighted  that  the  residual  on  the  borders  of  an  element  serves  to  couple 
elements  within  a  triangulation. 


The  F(Mft)  term  is  not  present  in  CG  FEM  discretizations,  but  for  DG  schemes,  the 
proper  specihcation  of  F(m/i)  can  stabilize  the  numerical  scheme.  The  problem  with 
CG  FEM  discretizations  when  it  comes  to  advective  problems  is  that  CG  schemes 
inherently  use  a  “central”  difference  type  discretization  for  the  flux.  While  this  is 
more  accurate  than  an  upwind  scheme,  it  is  well  known  that  central  schemes  tend 
to  be  unstable  (Chapra  and  Canale,  2006)  for  advective  problems.  To  stabilize  the 
advection  scheme,  then,  a  number  of  strategies  can  be  employed,  but  all  involve 
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adding  some  numerical  dissipation  to  the  scheme.  With  CG,  adding  the  dissipation 
can  be  a  complicated  process,  but  with  DG,  this  dissipation  can  very  naturally  be 
added  through  the  F{uh)  flux  terms.  In  order  to  choose  an  appropriate  functional 
form  for  F(m/i)  that  adds  the  minimum  amount  of  dissipation  to  the  scheme,  we  make 
use  of  the  results  from  the  well-studied  Riemann  problem. 


3.2.1  Riemann  solvers  for  DG 

This  section  makes  extensive  use  of  chapters  2  and  6  of  Hesthaven  and  Warburton 
(2008)  and  the  excellent  text  by  LeVeque  (2002).  This  section  only  serves  as  a  brief 
review,  and  the  reader  is  referred  to  LeVeque  (2002)  for  further  study. 

The  Riemann  problem  is  named  after  Bernhard  Riemann,  and  it  involves  the 
solution  of  a  conservation  law  together  with  piecewise  constant  initial  conditions 
containing  a  single  discontinuity.  The  Riemann  problem  is  useful  for  understand¬ 
ing  hyperbolic  systems  of  equations,  because  all  the  properties  (such  as  shocks  and 
rarefaction  waves  for  the  Euler  equations)  appear  as  characteristics,  or  “Riemann  in¬ 
variants”  in  the  solution  of  the  Riemann  problem.  When  solving  conservations  laws 
in  the  DG  framework,  the  discontinuity  arises  at  the  interface  of  two  elements,  where 
a  jump  in  the  value  of  the  properties  occur,  and  theory  from  the  Riemann  problem 
is  used  to  construct  the  fluxes  properly. 

A  general,  non-linear  hyperbolic  system  of  equations  in  two-dimensions  can  be 
written  as 


^  (94  (u)  (94  (u) 

dt  dx  dy 


(3.14) 


and  in  the  DG  context  we  are  interested  in  finding  an  approximation  for  F(u)  ■  h  = 
fxhx  +  fj/h?;-  Because  we  are  only  interested  in  a  one-dimensional  flux  normal  to  the 
boundary,  we  can  make  use  of  the  theory  for  linearized  hyperbolic  one-dimensional 
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systems.  The  above  system  is  rewritten  as 


du 


(94  (u)  (9u 
(9u  dx 

du 

a 


(94  (u)  (9u 
(9u  dy 
du  du 

dx  ^  dy 


0 

0 


by  using  the  chain-rule  and  letting  A^.  and  be  the  dx  d  Jacobian  matrices,  where 
d  is  the  dimension  of  the  problem.  Now  we  use 


A 


y 


and  we  can  consider  the  one-dimensional  system 


du 

dt  (9h 


0 


(3.15) 


where  A  is  a  function  of  u.  Now,  hyperbolic  systems  of  equations  are  characterized 
by  the  fact  that  A  is  diagonalizable,  that  is 


A  =  SAS“^  (3.16) 

A|  =  S|A|S-^  (3.17) 


where  we  have  also  dehned  |A|.  Here  A  is  a  diagonal  matrix  with  the  eigenvalues 
on  the  diagonals,  and  the  columns  of  S  contain  the  eigenvectors  of  A.  Multiplying 
equation  3.15  by  and  setting  S“^u  =  J  we  have 


dl 


A -9^ 
^dh 


0 


(3.18) 


where  the  entries  of  X  are  termed  the  “Riemann  invariants.”  Through  this  procedure, 
one  obtains  a  decoupled  system  of  equations,  where  coupling  remains  only  through  the 
eigenvalues  of  the  system.  Each  scalar  invariant  Ij  is  advected  at  the  speed  Xj,  where 
the  speed  is  in  the  normal  direction  if  Xj  >  0  and  the  speed  is  opposite  the  normal 
direction  when  Xj  <  0.  According  to  the  theory  of  characteristics,  the  following  is  the 
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solution  for  an  initially  discontinuous  state  if  using  n  =  n"*"  and  referring  to  Figure 
3-4: 


Jj=J+  iiXj>0  (3.19) 

=  J-  if  Aj  <  0  (3.20) 

With  some  manipulation  (Hesthaven  and  Warburton,  2008)  we  can  recover  the  form 
for  the  flux 


F(u)-h+  =  A{{u}}  +  i|A||u]  (3.21) 

where  A  is  a  function  of  both  u+  and  u“  for  general  non-linear  fluxes.  For  linear 
flux  functions,  the  formulation  is  complete  since  A  would  not  be  a  function  of  u^. 
For  non-linear  fluxes,  what  remains  is  to  choose  the  form  for  A  and  |A|,  and  this 
distinguishes  the  various  approximate  Riemann  solvers  from  each  other.  Note  that 
Au  F(u)  ■  h  is  a  linearization  of  the  flux.  A  natural  choice  yielding  a  consistent 
flux  is  to  let 


F(u).n+  =  {{F(u).n+}}  +  l|A||u]  (3.22) 

while  appropriately  choosing  a  form  for  |A|.  Two  possible  choices  are 

|A|  =  |Aa.({{u}})ha;  +  Aj^({{u}})hy|  (3.23) 

|A|  =  |{{A^(u)}}h^-f{{Ay(u)}}hy|  (3.24) 

where  care  needs  to  be  taken  to  ensure  that  |A|  has  purely  real  eigenvalues  for  3.24. 


An  often  used  approximate  Riemann  solver  (due  to  its  ease  of  implementation) 
is  the  local  Lax-Friedrichs  solver,  which  assumes  that  there  is  one  dominating  wave 
in  the  system,  and  enough  numerical  dissipation  is  added  to  stabilize  the  scheme  for 
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this  wave.  The  Lax-Friedrichs  solver  is  as  follows 


F(u)  ■  h+  =  {{F(u)  ■  h+}}  +  i|A™a^|[u]  (3.25) 

where  Xmax  is  the  eigenvalue  with  the  largest  magnitude.  The  local  Lax-Friedrichs 
solver  chooses  the  value  of  Xmax  locally  on  an  edge. 


3.2.2  Quadrature-free  versus  Quadrature  based  algorithms 


Consider  the  numerical  implementation  of: 


F{u)  ■  VwdK 


'K 


or 


/  F{u)-VwdK  ^  / 

'k  Jk 


(3.26) 


(3.27) 


Clearly,  there  are  two  choices  for  discretizing  the  equation.  For  the  first  case  (3.26)  a 
quadrature  scheme  is  introduced  to  perform  the  volume  integral 


P  ^SP 

/  F{u)-VwdK  ~  Vf 
Jk  . 


'uh,i9i{yik)  ■  V0j(xfc)WfcJ(xfc)  (3.28) 


where  W  are  the  gauss  weights  and  J(x)  is  a  Jacobian  (for  the  coordinate  transfor¬ 
mation  between  the  master  and  current  element)  evaluated  at  gauss  point  x.  For  a 
non-linear  flux  F,  this  scheme  cannot  be  further  simplihed,  and  this  integration  needs 
to  be  performed  for  every  element.  The  total  cost  involves  a  series  of  matrix- vector 
multiplies  to  interpolate  the  nodal/modal  values  onto  the  gauss  points,  followed  by 
another  matrix-vector  multiply  to  perform  the  integration.  Also,  normally  the  num¬ 
ber  of  gauss  weights  are  taken  to  be  greater  than  the  number  of  nodes  or  modes, 
resulting  in  larger  matrices  and  a  larger  computational  expense. 
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For  the  second  choice  of  discretization,  3.27,  we  can  write 


F{u)  ■  VwdK 


IK 


[  V0*(x)-V0,(x)di7 

Jk  . 


(3.29) 


where  Cj^  =  /^6'j(x)  ■  VOjix^dK  and  can  be  precomputed.  What  remains  is  to 
calculate  the  fluxes  at  the  nodal  points,  or  for  each  mode.  Straight-sided  triangles 
have  a  constant  Jacobian,  in  which  case  C  can  be  precomputed  for  a  reference  element 
and  scaled  with  J  to  form  C^.  In  that  case,  no  quadrature  scheme  is  required,  which 
means  only  an  evaluation  of  the  fluxes  and  a  single  matrix-vector  multiplication  is 
required.  This  is  signihcant  saving  over  the  quadrature  scheme.  As  an  additional 
beneht,  the  M“^C  matrix  can  be  precomputed,  which  results  in  additional  savings. 
However,  the  quadrature-free  scheme  commits  a  variational  crime,  which  can  result  in 
problems.  For  instance,  the  non-linear  fluxes  may  suffer  from  aliasing  errors  causing 
instabilities  in  the  solution,  but  these  may  be  remedied  by  some  minor  hltering  of  the 
higher  modes  (Hesthaven  and  Warburton,  2008)  at  slightly  reduced  accuracy  and  rate 
of  convergence  compared  to  the  quadrature-based  scheme.  For  a  discussion  on  the 
magnitude  of  the  aliasing  errors,  which  depend  on  the  smoothness  of  Uh  and  F(m/i), 
the  reader  is  referred  to  Chapter  5  of  Hesthaven  and  Warburton  (2008). 


3.3  DG  with  second  order  derivatives 

The  extension  of  DG  to  higher  order  derivatives  is  discussed  in  this  section.  Only 
the  extension  to  second  order  equations  will  be  discussed  in  detail,  and  for  a  gener¬ 
alization  to  higher  orders,  the  reader  is  referred  to  Yan  and  Shu  (2002).  We  could 
consider  the  problem 

3u 

— +  V -Finviu) +  V -Fyisiu,  Vu)  =  S{u)  (3.30) 
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where  Finv{u)  is  the  functional  form  for  the  inviscid  fluxes,  and  Ft,js(w,  Vn)  is  the 
functional  form  for  the  viscous  fluxes.  However,  since  the  advection  and  source  term 
have  already  been  discretized  in  Section  3.2,  we  shall  consider  the  simpler  equation 

du 

—  +  V  ■  [-nVu]  =  0  (3.31) 

where  we  have  taken  Vn)  =  —k'Vu  with  k  a  constant  or  some  function  of  x. 


A  simple  choice  for  the  numerical  flux  F^jj^  is  obtained  by  calculating  the  derivative 
of  the  solution  within  each  element  and  then  using  a  central  flux,  that  is  Ft,is  = 
However,  while  the  discrete  matrix  is  relatively  well  conditioned,  this  scheme 
was  proven  to  yield  unstable  results  in  some  cases,  for  example  see  chapter  7  of 
Hesthaven  and  Warburton  (2008).  Little  progress  was  made  until  Bassi  and  Rebay 
(1997)  suggested  rewriting  the  equation  as  a  system  of  hrst-order  equations,  in  which 
case  3.31  becomes: 


+  V-q  =  0,  in(0,T)xH  (3.32) 

at 

q  +  kVw  =  0,  in  H  (3.33) 

u  =  qd,  on  dQn  (3.34) 

q  ■  h  =  gN,  on  dflN  (3.35) 

u  =  Uq,  in  (0,0)  X  H  (3.36) 


The  discretization  of  3.32  proceeds  the  same  as  the  discretization  of  3.6  using  the 
same  space  H4  as  dehned  in  3.10,  such  that  we  have 


-p^wdK  +  [  q/i  ■  hwddK  — 
dt  Jqk 

-wdK  +  [  [q/j  —  q/i]  ■  hwddK  + 


du} 


IK 


Ik 


q/i  ■  VwdK 


0 


(3.37) 

(3.38) 


in  the  weak  (3.37)  and  strong  (3.38)  forms.  The  discretization  of  3.33  proceeds  simi- 
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larly,  but  first  we  need  to  define  a  new  space 


VI  =  {v  G  ;  V  l^e  G  (3.39) 


where  d  is  the  dimension  of  the  problem.  is  a  vector  space,  which  is  different  from 
the  scalar  space  dehned  in  3.10.  Then  proceeding  to  discretize  3.33  we  have 


q/i  •  \dK  +  /  K,Vuh  ■  'vdK 


>K 


>K 


K  q/i  ■  wdK  +  /  h/jV  ■  ndOK  —  /  UhV  ■  wdK 


Ik 


'dK 


'K 


IK 


K  q/j  ■  \dK  +  [uh  —  Uhl'v  ■  nddK  +  /  Vuh  ■  vdK 
JdK  J  K 


0 

0  (3.40) 

0  (3.41) 


in  the  weak  (3.40)  and  strong  (3.41)  forms.  What  is  left  is  to  specify  the  form  for  the 
numerical  fluxes.  The  most  general  form  of  the  flux  (which  follows  from  a  stability 
analysis)  as  written  using  the  notation  suggested  by  Castillo  et  al.  (2000),  is  as  follows: 


qh  =  {{q/i}}  -  Ciilufth]  +  Ci2[qft  ■  h]  (3.42) 

Uh  =  {{wft}}  -  Ci2  ■  [ufth]  -  C22[qft  ■  h]  (3.43) 


The  values  chosen  for  the  parameters  Cn,  C12,  and  C22  results  in  a  number 
of  different  schemes  for  elliptic  problems  using  a  DG  discretization.  Castillo  et  al. 
(2000)  also  showed  that  Cu  >  0  and  C22  >  0  is  required  for  the  DG  method  to  give 
a  unique  approximate  solution.  Arnold  et  al.  (2002)  analyzed  some  of  the  available 
schemes  under  a  unified  framework,  and  more  recent  work  by  Cockburn  et  al.  (2009) 
analyzes  the  existing  schemes  under  the  new  Hybrid  Discontinuous  Galerkin  (HDG) 
framework,  which  unihes  not  only  DG  methods  for  second  order  elliptic  problems, 
but  also  CG  and  mixed-methods.  The  following  sections  briefly  describe  some  of  the 
popular  methods  for  discretizing  elliptic  problems  using  DG. 

Unless  explicitly  stated,  the  schemes  described  treat  the  boundary  conditions  in 
a  weak  sense.  That  is,  to  satisfy  the  boundary  conditions  for  the  set  of  equations 


3.32-3.36  the  following  is  used: 


Uh  =  ud,  on  dflci  (3.44) 

qh  =  <ih  -  Cn{uh  -  UD)n,  on  dQo  (3.45) 

Uh  =  Uhl  on  OVLn  (3.46) 

qh  =  fi'Arh,  on  dVtM  (3.47) 


Also,  the  majority  of  schemes  take  C22  =  0.  This  serves  to  decouple  the  solution  of  u 
and  q,  which  means  q  can  be  solved  independently  of  u,  thereby  allowing  the  scheme 
to  be  less  computationally  expensive.  The  alternative  is  to  solve  for  u  and  q  simul¬ 
taneously  which  is  a  strategy  employed  by  some  variants  of  the  Local  Discontinuous 
Galerkin  (LDG)  method,  and  the  HDG  methods. 


3.3.1  Internal  Penalty  (IP)  method 

The  IP  method  for  discontinuous  elements  were  originally  developed  by  Arnold 
(1982),  and  uses  the  following 

C12  =  0,  Cii  =  r ,  C22  =  0 

qh  =  {{Vtift}}  -  rluhii]  (3.48) 

where  r  is  chosen  appropriately  according  to  the  application.  With  this  choice  of 
parameters,  qh  is  penalized  by  r  times  the  jump  in  Uh,  and  Uh  uses  the  average 
value  of  Uh  on  the  edge.  The  IP  method  combines  sparsity  with  a  low  condition 
number  (comparable  to  the  condition  number  when  using  central  fluxes  (Hesthaven 
and  Warburton,  2008))  for  the  discrete  operator,  which  makes  IP  a  popular  method. 
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3.3.2  The  Local  Discontinuous  Galerkin  (LDG)  method 


The  LDG  method  was  introduced  by  Cockburn  and  Shu  (1998a),  and  uses  the 
following: 


C 


12  — 


Cii  —  r,  C22  —  0 


(3.49) 


With  this  choice  of  parameters,  is  penalized  by  r  times  the  jump  in  and  ^ 
times  the  jump  in  q/^,  and  Uh  is  penalized  by  ^  the  jump  in  Uh-  Note  that  C12 
can  be  arbitrarily  chosen  to  be  either  associated  with  h+  or  h“,  the  only  restriction 
being  that  C12  7^  Yi^\/dK^  G  .  That  is,  at  least  one  of  the  edges  in  the  element 
has  to  associate  C12  with  a  different  normal  than  the  other  edges.  If  this  criteria  is 
not  satished,  the  scheme  can  still  be  stable  for  non-zero  r,  but  if  it  is  satished  the 
scheme  is  stable  for  r  =  0  and  gives  the  minimum  dissipation  scheme  (Hesthaven  and 
Warburton,  2008).  Normally  r  is  chosen  to  be  zero  in  the  interior,  and  non- zero  on  the 
boundary  of  the  domain.  A  larger  value  of  r  on  the  boundary  enforces  the  boundary 
conditions  more  strictly,  where  in  the  limit  of  r  =  cx)  the  boundary  conditions  are 
enforced  in  a  strong  sense. 

Using  the  LDG  fluxes,  we  can  rewrite  iih  =  {{m/i}}  —  Ci2[Mh]  and  qh  =  {{qh}}  — 
-|-  Gi2[q  ■  h]  in  a  simpler  form  as  follows: 


K  +  ^h  ^  f  +  +  -  H 

Uh  =  - ^ - -  ■  [u+n+ - 

ul  +  ul  u^-ul 


T 


(  — Gi2  +  1)  _|_  (  — Gi2  —  1)  _ 

- 2 - - 2 - 

Qh  +qh  ~ 


qh  =  ^^^y^-GnM  +  ^[q^n+-q-n+l 

(Gi2  +  1)  +  ((712  —  1)  _ 

- ^ - -  CiiluhVil 


Gi2  —  il 


(3.50) 


(3.51) 

(3.52) 


This  form  highlights  the  “flip-flop”  nature  of  the  scheme,  that  is,  if  Uh  is  chosen  from 
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,  q  is  chosen  from  K  .  The  criteria  for  enabling  a  stable  minimum  dissipation 
scheme  for  C12  is  then 

dK 

where  is  the  number  of  edges  in  the  element.  With  a  clever  implementation,  C12 
can  be  taken  as  C12  =  1  (or  =  —1)  on  all  the  edges  (as  defined  globally)  while  still 
enforcing  the  criteria  to  obtain  the  minimum  dissipation  scheme. 

The  minimum  dissipation  property  makes  LDG  an  attractive  scheme  for  solving 
second  order  elliptic  equations  using  DG,  however  unlike  the  IP  scheme,  for  example, 
it  has  a  non-compact  stencil  in  higher  dimensions,  which  means  it  takes  information 
from  elements  that  are  not  direct  neighbours.  The  problem  can  be  overcome  by 
slightly  modifying  the  fluxes,  which  is  achieved  by  using  the  Gompact  Discontinuous 
Galerkin  (GDG)  method  described  next.  Finally,  the  condition  number  of  the  discrete 
global  operator  is  approximately  twice  as  large  as  the  condition  number  for  the  IP 
method  (Hesthaven  and  Warburton,  2008). 


<w  yxeTu 


(3.53) 


3.3.3  The  Compact  Discontinuous  Galerkin  (CDG)  method 

The  GDG  method  is  a  modihcation  of  the  LDG  method  developed  by  Peraire 
and  Persson  (2007).  A  compact  stencil  is  achieved  in  GDG  by  carefully  studying 
how  the  non-compactness  of  the  q/j  fluxes  arise  in  LDG.  Referring  to  Figure  3-5  the 
non-compactness  of  the  LDG  method  can  be  explained. 

Gonsider  element  1.  On  the  edge  a,  ith  is  taken  as  u\  for  calculating  qj^  and  q^. 
However,  on  edge  b,  ith  is  taken  as  uf^,  which  means  the  calculation  of  q^  also  contains 
information  from  element  3.  Due  to  the  flip-flop  nature  of  LDG,  on  edge  a,  is  then 
taken  as  q^  which  contains  information  from  element  3,  thereby  resulting  in  a  non¬ 
compact  scheme.  The  GDG  method  recognizes  this  situation,  and  to  remedy  the 
problem  it  saves  a  version  of  q|  that  does  not  include  the  information  from  element 
3. 

The  fluxes  are,  then,  the  same  as  the  LDG  fluxes,  except  that  qt  is  replaced  by  qe^h 
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Figure  3-5:  Non-compactness  of  LDG  in  multiple  dimensions. 


which  contains  only  information  from  neighboring  elements,  resulting  in  a  compact 
stencil: 


Ci2  =  — ,  Cii  =  r ,  C22  =  0 

q'/i  =  {{<!«, !■}}  -  Ciiltun]  +  Cizlqj.t  •  fl| 


(3.54) 


The  flux  can  be  written  in  terms  of  C12  as 

(^12  +  1)  +  (^12  —  1)  _  IT  -Tl  lo 

^ - ^ - ^e,h  -  Cii  [Uhn]  (3.55) 

CDG  retains  all  the  theoretical  properties  of  LDG,  and  numerical  experiments  have 
shown  that  the  stability  for  GDG  may  even  be  enhanced  compared  to  the  stability  of 
LDG.  The  main  advantage  of  GDG  is  the  more  compact  stencil  for  efficient  numerical 
treatment  of  implicit  time  integration  schemes,  at  the  cost  of  slightly  more  expensive 
flux  evaluations  for  matrix-free  iterative  methods. 

3.3.4  Hybrid  Discontinuous  Galerkin  (HDG)  method 

The  HDG  method  is  the  newest  method  discussed,  with  Gockburn  et  al.  (2009), 
Nguyen  et  al.  (2009)  being  the  main  references  in  the  literature.  The  derivation  of 
HDG  methods  are  signihcantly  different  from  the  previous  methods,  but  can  be  recast 
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in  the  more  standard  form  as  follows: 


Cii  = 


r '  r 


r+  +  T~ 


Ci2  =  X 


rn 


2  \T+  +  T~ 


C22  — 


r+  +  T~ 


(3.56) 


where  the  parameter  r  is  non-uniquely  dehned  on  each  edge.  With  this  choice  of 
parameters,  is  penalized  by  both  the  jump  in  Uh  and  q^,  and  Uh  is  also  penalized 
by  both  the  jump  in  Uh  and  q^. 

The  derivation  of  the  HDG  scheme  is  as  follows.  First  we  define  e  =  dK~^  f]dK~ 
in  the  interior  and  e  =  dK  fj  dQ  on  the  boundary.  Then  we  dehne  a  new  space: 


Ml  =  {^^eL\eh):fx\eeV^{e)yeEeh}  (3.57) 


That  is,  the  space  is  continuous  on  each  edge  of  Eh  (as  opposed  to  Vjf  and  Wj^ 
which  are  discontinuous).  HDG  is  derived  by  noting  that  each  element  can  be  solved 
independently  of  the  other  triangles  if  the  value  at  the  boundary  of  the  element  u 
is  known.  The  boundary  data  u  can  be  expressed  in  terms  of  a  new  variable  that 
we  dehne  as  \h  E  M^(0),  where  the  notation  M^(0)  refers  to  the  space  that  is 
zero-valued  on  the  boundaries  of  the  domain: 


Pgn,  onef 

\h,  on  el 


(3.58) 


Here  P  is  an  operator  that  projects  the  boundary  data,  qd,  onto  u.  Note  that  the 
boundary  conditions  for  HDG  are  enforced  strongly  for  h,  but  this  translates  to  a 
weak  enforcement  inside  the  elements.  The  form  for  the  hux  q/i  is  then  chosen  as  in 
Nguyen  et  ah  (2009) 


q/i  =  q/i  +  T{uh  -  Uh)  (3.59) 

where  r  can  be  taken  as  r  =  0{1)  for  optimal  convergence  of  elliptic  problems. 
This  hux  dehnition  is  not  unique,  but  this  choice  gives  convergent,  stable  results  for 
properly  chosen  values  of  r.  This  allows  the  local  problems  to  be  written  completely 
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in  terms  of  the  boundary  data  Uh-  The  boundary  data  Uh  then  needs  to  be  solved  as 
a  coupled  set  of  equations  that  enforce  conservativity  of  the  fluxes  between  elements. 
The  global  equation  solved  is  then 


[qjde/i  —  /  gNfJ^dTj^f 


'£h 


(3.60) 


This  solution  method  involves  three  steps: 


1.  The  inversion  of  a  local  operator  on  each  element  to  form  the  right-hand-side 
vector  (and  global  matrix) 


2.  The  global  solution  of  the  boundary  data 


3.  The  local  reconstruction  of  the  solution  on  the  element 


The  local  operations  are  cheap  because  inversions  are  done  on  small  dense  matrices, 
while  the  global  solve  contains  considerably  less  unknowns  than  the  original  system 
that  would  be  obtained  from  an  IP  or  CDG  method.  This  procedure  dramatically 
increases  the  efficiency  of  solving  elliptic  problems  with  DG  where  implicit  time  inte¬ 
gration  is  required.  Implicit  time  integration  may  be  necessary  to  overcome  stringent 
numerical  stability  criteria  which  limit  the  timestep  size  for  explicit  schemes. 

As  an  additional  benefit  with  this  method,  both  Uh  and  converge  at  the  optimal 
rate  of  0{p  +  1)  when  r  =  0{1)  which  allows  a  post-processed  solution  u\  which 
converges  at  0{p  +  2).  This  property  is  lost  for  large  values  of  r.  The  post  processing 
can  be  achieved  by  solving  the  following  diffusion  equation  locally  on  each  element: 


uVul  ■  Vw*dK  = 


q  ■  Vw*dK,  Ww*  e 


ip+i 


IK 


Ik 


uldK  = 


IK 

UhdK 


'K 


(3.61) 

(3.62) 


For  additional  details  about  HDG,  the  reader  is  referred  to  Nguyen  et  ah  (2009). 
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3.4  Implementation  issues 


Implementation  issues  related  to  the  non-linear  fluxes  have  already  been  discussed 
in  section  3.2.2.  The  remaining  issues  include  the  data-structures  (see  Persson  and 
Peraire  (2006))  which  will  not  be  discussed,  the  solution  method  for  inverting  large 
matrices  which  is  the  topic  of  the  Chapter  5,  and  the  type  of  basis  to  use,  which  is 
the  topic  of  this  section. 

An  example  of  a  simple  modal  basis  set  in  one-dimension  of  order  p  is  the  monomial 
basis  x^~^.  Here  the  unknown  coefficients  are  related  to  each  mode,  and  are  not 
directly  related  to  the  solution. 

Another  popular  basis  set  is  the  nodal  basis,  which  is  related  to  a  set  of  p  -|-  1 
points  X  for  a  order  basis  set.  An  example  of  a  nodal  basis  in  one-dimension  is  the 
Lagrange  polynomial  ii{r)  =  Ylj=ij^i  x~-x'  property  ii{xi)  =  Sij,  where 

r  is  an  arbitrary  point.  That  is,  the  basis  related  to  point  Xi  is  equal  to  one  at  Xi  and 
zero  at  all  the  other  points  ^  Xi. 

There  are  some  advantages  and  disadvantages  to  each  approach  (Karniadakis  and 
Sherwin,  2005).  The  modal  approach  handles  p-adaptivity  (adapting  the  order  of 
the  basis  function)  more  naturally,  but  requires  a  function  evaluation  of  all  the  bases 
to  find  the  value  of  Uh  at  any  point  in  the  element.  The  nodal  approach  has  the 
advantage  that  the  expansion  coefficients  are  equal  to  the  approximate  value  of  the 
function  at  the  specified  nodal  points  x.  Thus,  to  determine  the  value  of  Uh  at  the 
nodal  points,  no  function  evaluation  is  needed  and  simply  the  coefficient  needs  to  be 
read  from  memory. 

For  the  purposes  of  this  work,  a  nodal  basis  will  be  used.  The  details  of  creat¬ 
ing  this  nodal  basis  is  discussed  in  Hesthaven  and  Warburton  (2008),  and  a  similar 
implementation  is  used  for  this  work.  The  procedure  is  somewhat  complicated  be¬ 
cause  a  close- form  analytical  solution  for  a  nodal  basis  on  a  triangle  does  not  exist. 
That  is,  given  a  set  of  nodes  and  the  order  of  a  basis,  an  analytical  expression  de¬ 
scribing  the  shape  of  all  the  basis  functions  has  not  been  found.  In  order  to  con¬ 
struct  the  discrete  operators  for  a  nodal  basis,  an  appropriate  modal  basis  is  used. 
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The  reason  why  this  works  is  because  the  modal  basis  and  nodal  basis  can  be  re¬ 
lated  to  each  other  due  to  the  uniqueness  of  the  polynomial  representation,  that  is 

The  overall  procedure  for  using  a  nodal  basis  to  construct  the  FEM  operators  as 
follows: 

1.  Solve  for  modal  coefficient  which  relate  the  modal  and  nodal  bases 

2.  Evaluate  the  modal  basis  at  specihed  points  and  using  the  modal  coefficients 
from  step  1,  hnd  the  values  of  the  nodal  basis  at  the  specihed  points 

3.  Evaluate  the  derivatives  of  the  modal  basis  at  specihed  points  and  using  the 
modal  coefficients  from  step  1,  hnd  the  values  of  the  derivatives  of  the  nodal 
basis  at  the  specihed  points 

4.  Construct  the  hnite-element  operators,  such  as  the  mass  matrix,  using  a  com¬ 
bination  of  steps  1,  2,  and  3.  Save  the  constructed  matrices  for  later  use. 

First,  the  coefficients  which  related  the  modal  and  nodal  basis  functions  are  found. 
In  order  to  do  this,  the  modal  Koornwinder  basis  (Koornwinder,  1991),  which  is  an 
orthogonal  polynomial  basis  constructed  using  Jacobi  polynomials,  is  used  to  hnd  the 
coefficients  In  what  follows  we  use  and  to  diherentiate  between  the 

coefficients  for  the  modal  basis  and  nodal  basis  respectively.  Then 

Vuf  =  uf  (3.63) 

where  V  is  the  Vandermonde  matrix  (see  Trefethen  and  Ban  (1997))  with  Vij  =  ipj{xi), 
that  is,  the  modal  function  evaluated  at  the  nodal  point.  We  know  that  we 
want  the  value  of  the  nodal  basis  to  be  1  at  the  nodal  point  (xi),  and  zero  for 
all  the  other  nodal  points  ^  :sii.  We  can  then  solve  for  all  the  modes  that  will  give 
this  polynomial  basis 


=  V 


(3.64) 
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where  uf  is  a  zero  vector  except  for  a  1  at  the  entry. 

Second,  with  this  modal  polynomial  basis  in  place  and  the  coefficients  relating  the 
two  bases,  the  value  of  the  nodal  basis  can  be  evaluated  at  specihc  points.  All  that 
is  required  is  the  value  of  the  modal  basis  at  the  desired  points  and  the  solved  values 
of  the  modal  coefficients  ,  from  the  hrst  step.  The  values  of  the  nodal  basis  can 
be  found  by  multiplying  through  with  another  Vandermonde  matrix 

V„„nf‘  =  (3.65) 

(3.66) 

where  Vgpts  is  the  Vandermonde  matrix  for  that  modal  basis  that  evaluates  the  func¬ 
tion  at,  for  example,  the  gauss  points  ^i^gpts,  and  VN,gpts  =  (t>j{.^i,gpts)  is  the  Vander¬ 
monde  matrix  for  the  nodal  basis. 

Third,  we  can  similarly  get  the  value  of  the  derivatives  of  the  nodal  basis  at  the 
gauss  points  by  finding  the  derivatives  of  the  modal  function,  and  using 

^  =  V^^N,gpts  (3.67) 

where  subscript  (.)^  indicates  a  derivative  with  respect  to  coordinate 

Lastly,  using  the  above  method,  the  discrete  elemental  operators,  such  as  the  mass 
matrix  M,  can  be  created  and  stored. 

Two  remaining  issues  include  the  choice  of  nodal  locations  and  the  condition 
number  of  V.  A  well-behaved  set  of  nodal  locations  allow  accurate  interpolation  and 
a  poorly-behaved  set  results  in  interpolation  with  large  oscillations.  The  condition 
number  of  V  for  the  modal  basis  can  affect  the  accuracy  when  forming  the  discrete 
operators.  The  first  issue  is  dealt  with  by  optimizing  a  set  of  nodal  points  to  minimize 
the  Lebesque  constant  (see  chapter  6  of  Hesthaven  and  Warburton  (2008)),  and  the 
second  is  resolved  by  using  an  orthogonal  modal  basis  such  as  the  Koornwinder  basis 
(Hesthaven  and  Warburton,  2002). 
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Chapter  4 


Biogeochemical  Reaction  Equation 

4.1  Introduction 

The  focus  of  this  chapter  is  accurate  numerical  modeling  of  biogeochemical  pro¬ 
cesses  in  the  ocean,  to  demonstrate  understanding  of  DG  FEM.  For  an  introduction 
to  biogeochemical  modelling,  the  reader  is  referred  to  Fennel  and  Neumann  (2004). 
This  work  is  important  in  order  to  predict  biological  events  such  as  plankton  blooms. 
Blooms  of  toxic  phytoplankton  can  be  harmful  to  humans  and  marine  life.  Also, 
phytoplankton  blooms  are  accompanied  by  an  increased  population  of  fish  which  can 
be  harvested  for  human  consumption.  Finally,  being  able  to  predict  biogeochemi¬ 
cal  ocean  processes  enables  the  study  of  these  processes  which  can  lead  to  a  better 
understanding  of  the  ocean  ecosystem. 

Biogeochemical  models  may  contain  a  large  number  of  biological  or  chemical 
constituents.  The  simplest  models  often  only  use  Nutrient,  Phytoplankton,  and 
Zooplankton  as  constituents,  and  are  commonly  called  NPZ  models.  More  com¬ 
plicated  models  (Besiktepe  et  ah,  2002)  can  be  adaptive  and  contain,  for  example,  24 
constituents.  Each  constituent  requires  the  solution  of  advection-diffusion-reaction 
(ADR)  equations,  governed  by 


dt^pi  +  V  ■  {uipi)  -  V  ■  KVipi  = 


(4.1) 
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where  u  =  [u^{x,y,t)  u^{x,y,t)]  is  the  velocity;  k  is  the  diffusivity;  and  (p  = 

[(pi,p>2)  is  a  vector  of  constituents,  i  referring  to  the  constituent,  and 

n  being  the  total  number  of  constituents.  Note  that  the  source  term  for  constituent 
i  can  be  a  function  of  any  number  of  the  constituents.  The  source  terms 

describe  “reactions” ,  and  often  lead  to  chaotic  dynamics. 

For  this  work  a  simple  NPZ  model  (Flierl  and  McGillicuddy,  2002)  is  used  with 
the  following  source  terms 

Sn{<Pni<Ppi<Pz)  =  dp4>p  +  dz4>z 

<Pn  +  ks 


+  (l-a)^0z(l-e-"^^) 

(4.2) 

Sp{(pN,  (pP:  4>z) 

=  -  dp(pp  -  -0z(l  -  e-"^") 

(pN  +  ks  V 

(4.3) 

Sz{(pN-,  (pP,  (pz) 

=  —dz(pz  +  (i-<pz0-  ~  6 

(4.4) 

where  the  parameters  are  explained  in  Table  4.1,  the  subscripts  {.)n,  {■)p,  {■)z  refer 
to  Nutrients,  Phytoplankton,  and  Zooplankton  respectively,  and  lowercase  refers 
to  the  depth  coordinate  decreasing  towards  the  bottom  and  taken  as  2:  =  0  at  the 
surface.  Note  that  three  equations  are  not  required,  and  the  third  constituent  could 
be  calculated  from  a  conservation  equation.  For  instance,  we  could  calculate  (pN 
using: 

(pN  =  -A/t  —  (pp  —  (pz  (4.5) 

where  N't  is  the  total  biomass,  which  in  general  is  a  function  of  time  and  space, 
but  is  taken  as  a  constant.  In  this  work  all  three  equations  are  solved,  where  the 
conservation  equation  4.5  serves  as  a  check  of  the  conservativity  of  the  numerical 
scheme. 

This  Chapter  is  organized  such  that  individual  components  of  the  complete  nu¬ 
merical  solution  are  solved  in  the  sections  leading  up  to  the  final  section,  which  solves 
the  full  ADR  equations. 
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Para¬ 

meter 

Description 

Valne  1/ 
Valne  2 

[nnits] 

U 

Phytoplankton  uptake  rate 

0.6 

[1/day] 

kg 

Saturation  rate  of  phytoplankton 

0.1 

[/xmol/L] 

dp 

Mortality  rate  of  Phytoplankton 

0.016 

[1/day] 

dz 

Mortality  rate  of  Zooplankton 

0.08/0.06 

[1/day] 

9 

Grazing  rate  of  Zooplankton 

0.1/0.13 

[L/(/rmoh  day)  ] 

a 

Assimilation  (efficiency)  rate 

0.4 

[] 

h 

e-folding  depth  for  light  (photosynthesis) 

17 

[m] 

V 

Parameter  for  Ivlev  form  of  grazing  fnnction 

0.1 

[L//imol] 

Mp 

Total  biomass 

5 

[/xmol/L] 

Table  4.1:  NPZ  equation  parameter  description  and  values 


4.2  Test  Problem  setup 


The  domain  of  interest  is  shown  in  Figure  4-1.  The  depth  of  the  domain  is  taken  as 
100  units,  and  the  width  of  the  domain  is  taken  as  100  nnits.  The  bottom  bathymetry 
contains  a  half-ellipse  with  minor  axis  of  20  nnits  in  x  and  100  nnits  in  centered 
at  {x,y)  =  (0,-100)  This  elliptical  obstacle  models  an  idealized  sea-monnt  which 
pertnrbs  the  flow.  A  steady  potential  flow  held  is  used,  and  the  potential  how  is  cal- 
cnlated  numerically  using  an  IP  DG  method  (see  Section  3.3.1)  with  MATLAB  code 
provided  by  Hesthaven  and  Warburton  (2008).  For  the  velocity  held,  slip  boundary 
conditions  are  used  on  the  top  and  bottom  bonndaries  by  specifying  the  valne  of 
the  streamline  at  those  bonndaries,  and  periodic  conditions  are  are  used  for  the  left 
and  right  sides  of  the  domain.  Streamline  of  the  how  held  are  plotted  in  Figure  4-1. 
Specihcally  we  take 


p){x^  z  =  0) 

=  0 

(4.6) 

ip{x,z  =  bottom) 

=  -100 

(4.7) 

ip{x  =  —50,  z) 

=  ip{x  =  50,  z) 

(4.8) 

where  ip  is  the  stream  function,  giving  a  how  from  left  to  right.  The  solved  potential 
how  held  is  scaled  by  factor  of  two,  which  is  eqnivalent  to  mnltiplying  ip  by  two,  or 
taking  ip{x,  z  =  bottom)  =  —200. 
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Figure  4-1:  Problem  domain  for  two-dimensional  biogeochemical  reaction  equations, 
with  velocity  streamlines  plotted. 

For  the  biological  constituents,  no  flux  boundary  conditions  are  specihed  at  the 
top  and  bottom  boundaries,  and  periodic  conditions  are  also  taken  at  the  left  and 
right  walls.  The  initial  concentrations  are  calculated  from  a  steady-state  solution  for 
no  flow,  and  the  equations  are  integrated  for  100  days. 

For  this  idealized  study,  the  diffusion  terms  are  set  to  zero,  that  is  k  =  0,  which 
allows  fully  explicit  methods  to  be  used,  because  the  numerical  stability  criteria  does 
not  limit  the  size  of  the  computations. 

4.3  One-dimensional  NPZ  equations 

Before  solving  the  full  test  problem,  the  NPZ  equations  are  solved  in  depth  to 
examine  the  behaviour  of  the  non-linear  source  terms.  First  the  steady  state  solution 
is  calculated  in  order  to  initialize  the  numerical  solution  with  a  biologically-feasible 
held,  and  second  time  integration  schemes  are  examined  to  ensure  the  solution  is 
stable  for  the  chosen  parameter  sets. 

4.3.1  Steady  State  Solution 

The  steady  state  solutions  for  this  model  in  depth  can  be  solved  for  by  setting  the 
left  hand  side  of  equation  4.1  equal  to  zero  and  substituting  the  form  of  the  source 
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terms: 


0  =  +dp<i>p  +  dz<t>z 

(pN  H“  kg 

+{l  -  a)^(j)z{l  -  (4.9) 

0  =  ^e^/fe  _  ^0^(1  _  ('410) 

(pN  +  ks  V 

0  =  —dz(j)z  +  (4-11) 


Add  4.9  and  4.10  together  to  find  the  steady  state  valnes  of  (j)p: 


0  =  -Ue‘l'‘4^+Ue‘>'‘4^  +  dp<t,p-dp.l.,  +  d,.l., 

(pN  “t“  kg  (pN  H“  kg 

^<Pz{l  -  -  -c 

V  V  ' 


V 


+  (1  -  aY-^z{\  -  -  -0z(l  - 


0  =  0z(ciz-a^(l-e-"^")) 


a - dz 

u  \  ag 


(4.12) 


Given  the  biological  parameters,  eqnation  4.12  can  be  used  to  solve  for  the  steady 
state  values  of  0p,  which  is  constant  with  depth.  The  (.)*  notation  here  indicates  that 
0p  is  not  the  value  of  0p  used  for  initialization  at  all  depths,  because  the  solution 
0p  is  only  valid  till  a  certain  depth  beyond  which  light  does  not  penetrate,  and  this 
depth  still  needs  to  be  calculated.  However,  in  the  remainder  of  the  derivation,  0p  is 
treated  as  a  known  constant.  Re-arranging  equation  4.10  for  (pz 


vlAe^l^4>P  (pN  udp(pp 

g{l  -  4>n  +  ks  g{l-e-’^^p) 

' - V - '  ' - V - ' 

D{z)  K^p 


(4.13) 
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and  substituting  this  expression  into  4.5: 


4>n 

(P'n  +  ks4>N 

0 


N't  —  D{z)- — — —  +  —  4)p 

<Pn  +  ks 

N't{4>n  +  kg)  —  D{z)(j)]\[  +  K^p{(j)T!  +  kg)  —  (t)p{(t)N  +  kg) 

(^N  +  4>n  {kg  —  N't  +  D{z)  —  K^p  +  (f)p)  +  {—K^p  —  N't  +  4>p)kg 

'' - v - '  " - V - ' 

Biz)  c^p 

-B(z)  ±  ^B(zy  -  4C^, 


With  equations  4.12  and  4.14,  the  initial  condition  can  then  be  numerically  calculated, 
given  a  numerical  array  of  depths  values  z; 


4>z 

=  max(A/'t  -  0^(z)  -  0p,  0) 

(4.15) 

4>p 

-  (f,* 

(4.16) 

^max(02, 10“^®) 

4>n 

=  N't  —  4>p  —  4>z 

(4.17) 

Using  this  procedure  the  initial  steady  state  can  be  calculated  numerically  and  used  to 
initialize  simulations.  When  the  correct  root  for  0^  is  chosen,  a  stable  equilibrium  is 
obtained,  but  this  is  not  the  only  equilibrium  of  the  system.  Burton  (2009)  describes 
the  other  equilibria  of  this  particular  NPZ  model,  and  also  examines  the  stability  of 
the  system  for  various  parameters. 


4.3.2  Temporal  convergence 

It  is  important  to  examine  the  behaviour  of  the  equations  over  time  for  different 
depths  because,  for  certain  parameter  sets,  the  equations  may  become  chaotic.  With 
chaotic  solutions,  the  numerical  solution  may  appear  to  be  unstable,  while  the  scheme 
is  correct.  We  found  that  this  is  particularly  important  for  unstructured  meshes  where 
element  edges  are  not  necessarily  aligned  with  the  depth.  Therefore  it  is  worthwhile 
to  look  at  the  most  simplihed  problem  hrst,  the  zero- velocity,  biological  reactions 
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only,  system: 


^  — h  (ip0p  +  +  (1  —  a)— 0z(l  —  e  (4.18) 

(pN  +  ks  V 

-  dp4>P  -  ^0z(l  -  (4.19) 

0Ar  +  fcs  V 

—dz4>z  +  ci^4>z{^  —  e  (4.20) 

An  explicit  first-order  Euler  scheme  is  compared  to  an  explicit  fourth-order  low  stor¬ 
age  Runge-Kutta  (LSRK)  scheme  (Hesthaven  and  Warburton,  2008,  Carpenter  and 
Kennedy,  1994)  for  time  integration.  If  an  implicit  scheme  was  used,  one  would  have 
to  to  solve  the  non-linear  source  terms  using,  for  example,  a  Newton-Raphson  it¬ 
erative  solver  at  each  time  step.  Here,  since  we  utilize  an  explicit  time  integration 
scheme,  the  size  of  the  stable  time  step  may  be  limited  by  the  timescale  of  the  biologi¬ 
cal  reaction  instead  of  the  time  step  size  for  the  advection  terms  in  the  solution  of  the 
two-dimensional  ADR  equations.  Therefore  many  one-dimensional  numerical  tests 
were  conducted  in  order  to  hnd  the  limiting  time-step  sizes  and  appropriate  initial 
conditions  for  the  biological  equations.  For  the  one-dimensional  numerical  tests,  the 
following  parameters  were  varied: 

•  Timestep  size:  [4,2,1.01,1,0.99,0.9,0.5,0.25,0.1,0.005] 

•  Solver:  [Erst  order  explicit  Euler,  fourth  order  explicit  LSRK] 

•  Biological  Parameters:  [parameter  set  1,  parameter  set  2] 

•  Initial  Conditions:  [Constant,  Steady  State  with  parameter  set  1,  Steady  State 
with  parameter  set  2] 

Figures  4-2  and  4-3  show  the  hnal  depth  concentration  prohle  of  parameter  set  1 
and  parameter  set  2  respectively  for  the  two  time  integration  schemes  after  100  days 
starting  from  the  steady  state  initial  states  of  parameter  set  2  and  parameter  set  1 
respectively.  The  true  solution  is  taken  as  the  final  prohle  found  using  LSRK  with 
a  small  time  step  size  of  0.005  days. 


d(j)j\i 

dt 

d(()p 

dt 

d(t)z 

dt 
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steady  state  conditions  using  Parameter  set  2 


dt=0.9,  Euler  for  Parameter  set  1  at  Time=100 


dt=0.9,  LSRK4  for  Parameter  set  1  at  Time=100 


dt=0.005,  LSRK4  for  Parameter  set  1  at  Time=100 


Figure  4-2:  Concentration  of  biological  constituents  with  parameter  set  1  after  100 
days  of  integration  using  the  steady-state  solution  of  parameter  set  2  for  the  initial 
condition  (a).  The  bottom  plot  (d)  is  taken  as  the  true  solution  and  uses  a  small  time 
step  with  the  LSRK  time  integration  scheme.  Plot  (c)  uses  LSRK  with  a  large  time 
step,  and  the  plot  (b)  uses  the  Euler  time  integration  scheme.  The  plots  (a-d)  show 
the  concentration  in  the  depth  of  each  constituent  on  the  left,  and  the  evolution  of 
the  amount  of  each  constituent  at  different  depths  (or  their  ’orbits’)  on  the  right. 
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steady  state  conditions  using  Parameter  set  1 


dt=0.9,  Euler  for  Parameter  set  2  at  Time=100 


c) 


dt=0.9,  LSRK4  for  Parameter  set  2  at  Time=100 


-80 
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dt=0.005,  LSRK4  for  Parameter  set  2  at  Time=100 


Figure  4-3:  Concentration  of  biological  constituents  with  parameter  set  2  after  100 
days  of  integration  using  the  steady-state  solution  of  parameter  set  1  for  the  initial 
condition  (a).  The  bottom  plot  (d)  is  taken  as  the  true  solution  and  uses  a  small  time 
step  with  the  LSRK  time  integration  scheme.  Plot  (c)  uses  LSRK  with  a  large  time 
step,  and  the  plot  (b)  uses  the  Euler  time  integration  scheme.  The  plots  (a-d)  show 
the  concentration  in  the  depth  of  each  constituent  on  the  left,  and  the  evolution  of 
the  amount  of  each  constituent  at  different  depths  (or  their  ’orbits’)  on  the  right. 
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Of  the  two  time  integration  schemes  tested,  the  LSRK  scheme  performed  the  best. 
The  Euler  time  integration  scheme  is  not  as  accurate  as  the  LSRK  scheme,  since  the 
hnal  prohle  using  a  time  step  of  0.9  days  does  not  match  the  “true”  solution,  whereas 
the  LSRK  scheme  does  match  the  “true”  solution  when  using  the  large  0.9  day  time 
step.  Both  schemes  become  less  stable/accurate  at  shallow  depths  when  the  time 
step  size  exceeds  1  day.  While  the  timescale  of  the  equations  depend  on  the  relative 
concentrations  of  the  constituents  and  the  depth,  these  results  suggests  that  the 
timescale  is  on  the  order  of  1  day.  A  detailed  analysis  of  the  timescales  involved  with 
this  system  is  carried  out  in  Burton  (2009),  where  it  was  found  that  the  biological 
timescales  vary  with  depth  and  the  value  of  concentration  at  a  point.  In  Burton 
(2009),  it  was  found  that  the  timescales  vary  between  zero  and  0.2  days,  which  is 
faster  than  what  we  found  for  numerically  consistent  answers.  With  the  results  of 
Burton  (2009)  in  mind,  we  use  the  LSRK  scheme  with  a  time  step  size  smaller  than 
0.2  days.  Note  that  for  longer  integration  times,  these  results  may  not  hold,  and  a 
smaller  time  step  size  may  be  required  for  an  accurate  solution.  Longer  integration 
times  not  only  allow  small  errors  to  grow,  but  may  put  the  biological  system  into  a 
state  which  has  a  faster  response  time,  leading  to  a  smaller  time  scale. 

There  are  a  number  of  differences  between  the  solutions  for  the  two  different  pa¬ 
rameter  sets.  Recall,  the  second  parameter  set  has  a  higher  Zooplankton  grazing  rate 
and  lower  death  rate  that  the  first  parameter  set.  First,  notice  that  the  concentra¬ 
tion  prohles  of  all  the  constituents  at  the  final  time  step  is  more  variable  in  depth 
for  the  second  parameter  set  (see  Figure  4-3)  than  the  hrst  parameter  set  (see  Fig¬ 
ure  4-2).  This  suggests  a  hner  spatial  discretization  will  be  necessary  to  accurately 
capture  the  physics  for  the  second  parameter  set.  Also,  looking  at  the  evolution  of 
the  constituents  or  the  orbits  of  the  constituents  at  the  shallowest  depths,  the  second 
parameter  set  evolves  much  faster  than  the  hrst  set  at  the  shallowest  depth  (-10  [m]). 
At  the  deepest  depths  (-37.5  and  -50  [m]),  the  second  parameter  set  evolves  slower, 
and  at  the  depth  of  -25  [m],  the  evolution  between  the  two  parameter  sets  are  simi¬ 
lar.  This  suggests  that  the  vertical  behaviour  between  the  two  parameter  sets  will  be 
signihcantly  different. 
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As  a  final  test,  the  simulations  were  initialized  using  the  steady  state  solution  cal¬ 
culated  using  equations  4.12  to  4.17.  It  was  found  that  the  steady  state  solution  was 
maintained  throughout  the  integration  length  of  100  days,  and  beyond,  for  both  inte¬ 
gration  schemes.  Therefore,  for  the  integration  time  of  interest,  the  time  integration 
scheme  is  stable. 


4.4  Two  dimensional  tracer  advection 

4.4.1  Implementation 

A  number  of  different  DG  implementations  were  written  for  solving  the  advection- 
diffusion  equations  (that  is  equation  4.1  with  S'  =  0).  The  particular  implementation 
here  uses  the  same  LSRK  time  integration  scheme  used  for  the  source  terms  along 
with  a  quadrature-free  DG  spatial  discretization  in  the  strong  form  with  a  nodal  basis. 
Diffusive  terms  can  be  treated  explicitly  or  implicitly  with  an  LDG  discretization.  The 
implicit  implementation  is  limited  to  an  implicit-Euler  time  integration,  but  could  be 
easily  extended.  Because  only  small  values  of  k  are  used  for  this  test-case,  implicit 
time  integration  is  not  necessary  for  an  efficient  solution  scheme. 

Gonsiderable  efficiency  was  gained  by  using  a  quadrature-free  scheme,  and  the 
unhltered  use  of  the  quadrature-free  scheme  was  possible  because  the  differential  part 
of  the  equations  are  linear. 

4.4.2  Higher  order  Advection 

Before  solving  the  two-dimensional  ADR  equations  on  the  test  problem  specihed, 
the  numerical  implementation  was  verihed.  A  number  of  test  cases  were  used  to  test 
the  convergence  of  the  numerical  implementation,  and  it  was  found  that  the  optimal 
p  +  1  rate  of  convergence  was  achieved  for  a  order  basis  function.  The  test  cases 
used  include  tracer  advection  through  periodic  domains  with  constant  velocity  helds, 
tracer  advection  using  a  rotating  helds,  and  tracer  advection  using  a  swirling  velocity 
held  (Leveque,  1996).  Additionally,  a  set  of  cases  were  run  using  the  NPZ  Test  case 
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Grid  1 

Grid  2 

Grid  3 

Nt  =  226 

Nt  =  856 

Nt  =  3438 

p  =  1 

DOF 

678 

2,568 

10,314 

Np  = 

3 

Time  [s] 

16 

73 

681 

Time/DOF 

0.0236 

0.0284 

0.0660 

p  =  2 

DOF 

1,356 

5,136 

20,628 

Np  = 

6 

Time  [s] 

34 

218 

2,551 

Time/DOF 

0.0251 

0.0424 

0.1237 

p  =  3 

DOF 

2,260 

8,560 

34,380 

Np  = 

15 

Time  [s] 

79 

1,473 

16,141 

Time/DOF 

0.0350 

0.1721 

0.4695 

Table  4.2:  Simulation  time  for  various  degrees  of  freedom  using  different  order  basis 
functions.  Timing  reported  using  3.4GHz  Intel  Linux  nodes 

described  in  Section  4.2  to  examine  the  cost  of  using  higher  order  advection  schemes. 
The  results  are  reported  in  Table  4.2.  Note  that  the  cost  over  the  simulation  time 
scales  as  Costume  ~  C{T)Nt{p  +  1)^  from  Hesthaven  and  Warburton  (2008),  where 
C{T)  is  a  function  dependent  on  the  total  integration  time  T,  Nt  is  the  number  of 
triangles,  and  p  is  the  order  of  the  basis.  From  Table  4.2  a  disconcerting  trend  is 
observed.  Even  for  a  similar  number  of  degrees  of  freedom,  the  solutions  using  higher 
order  basis  functions  are  more  expensive  than  lower  order  basis  functions.  The  reason 
for  this  is  two-fold:  first,  because  an  explicit  integration  scheme  is  used,  the  stable 
time-step  size  decreases  when  using  higher  order  bases  and  more  steps  are  required 
to  hnish  the  integration;  second,  higher  order  bases  inherently  are  more  expensive 
because  the  local  matrix  operators  are  larger,  and  the  local  operations  scale  as  0{Np). 
These  results  would  suggest  that  there  is  no  advantage  to  using  higher  order  bases 
for  calculations;  however,  the  accuracy  of  the  scheme  also  needs  to  be  examined. 

In  order  to  illustrate  the  difference  in  accuracy  for  lower  and  higher  order  schemes, 
a  cosine  bell  was  advected  twenty  times  through  a  periodic  unit  square  domain.  The 
hnal  shape  of  the  cosine  bell  for  the  lower  and  higher  order  schemes  are  compared  in 
Figure  4-4.  Figure  4-4  shows  that  the  solution  using  the  higher  order  basis  function 
is  more  accurate  for  fewer  degrees  of  freedom  with  a  lower  computational  cost.  That 
is,  the  result  using  the  lower  scheme  looses  20%  of  its  original  height  due  to  numerical 
diffusion,  while  the  result  with  the  higher-order  scheme  looks  nearly  identical  to  the 
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Low  Order 

p=1,  Time=260s,  DoF=1 0,300 


Initial  Condition 


High  Order 

p=6,  Time=100s,  DoF=6,300 


Figure  4-4:  Comparison  of  accuracy  for  twenty  periods  of  linear  advection  of  cosine 
bell  through  periodic  domain. 


initial  condition.  These  results  illustrate  an  important  point:  high-order  methods 
cannot  be  compared  to  lower  order  methods  on  a  degree  of  freedom  (DOF)  basis  but 
should  be  compared  on  an  efficiency  basis.  The  better  method  will  have  a  higher 
accuracy  at  the  same  efficiency,  or  a  greater  efficiency  at  the  same  accuracy. 


4.4.3  Test  case  advection  with  potential  flow  fleld 

In  order  to  demonstrate  spatial  convergence,  a  series  of  meshes  are  required.  These 
are  shown  in  Figure  4-5.  The  meshes  were  created  using  GMSH  ^  (Geuzaine  and 
Remade,  2009)  which  is  a  freely  available  mesh  generator.  Note  that  the  elements 
are  not  curved,  which  may  lead  to  problems  when  using  high  order  elements  (Bernard, 
2008). 

Figure  4-6  illustrates  the  h  and  p  convergence  of  the  pure  tracer  advection  problem 
for  the  calculated  potential  flow  held.  In  Figure  4-6,  (for  example)  refers  to  the 
use  of  of  Grid  1  as  labeled  in  Figure  4-5.  The  top  row  of  Figure  4-6  shows  the 
solution  being  rehned  from  left  (coarser  grid)  to  right  (hner  grid),  but  retaining  the 
same  features,  as  the  mesh  is  rehned.  The  bottom  row  of  Figure  4-6  shows  the  solution 
using  a  lower  (left)  and  higher  (right)  order  basis  function,  but  retaining  the  same 
features  as  the  order  of  the  basis  is  increased.  Qualitatively,  this  indicates  that  the 
1  WWW .  geuz .  org /gmsh 
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Figure  4-5:  GMSH  created  meshes  used  for  convergence  studies. 


solution  is  converging  when  rehning  h,  the  mesh,  and  p,  the  order  of  the  basis. 

Next,  temporal  convergence  is  demonstrated  for  the  test  problem.  Figure  4-7 
shows  that  the  same  hnal  flow  is  obtained  for  a  relatively  large  time  step  (bottom 
left  plot)  compared  to  a  smaller  time  step  (top  left  plot).  Additionally,  Figure  4-7 
illustrates  that  the  LSRK  scheme  allows  a  larger  time  step  to  be  used  than  what  is 
allowed  for  the  Euler  time  integration  scheme  (bottom  right  plot),  which  becomes 
unstable  for  dt  =  0.07.  The  top  right  plot  shows  the  initial  condition  for  this  flow. 
Thus,  qualitatively,  this  indicates  that  the  solution  is  converging  when  rehning  the 
time  step  size,  and  LSRK  allows  a  larger  time  step  compared  with  the  Euler  scheme 
to  be  used  while  maintaining  numerical  stability. 


4.5  Solution  of  biogeo  chemical  reaction  equations 

The  numerical  implementation  for  this  work  was  compared  with  the  code  imple¬ 
mented  by  (Burton,  2009),  and  the  models  agreed.  Note  that  the  stable  time  step 
size  for  advection  is  smaller  than  the  stable  time  step  for  biology.  Therefore  the 
time  step  size  of  the  ADR  equations  is  limited  by  advection  for  the  chosen  biological 
parameters. 

The  solution  at  the  hnal  time  is  shown  in  Figure  4-8  for  the  hrst  parameter  set, 
and  in  Figure  4-9  for  the  second  parameter  set.  Both  simulations  used  third-order 
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Figure  4-6:  h  —  p  convergence  of  purely  advective  test  case  for  the  NPZ  test  case 
problem.  G#  refers  to  the  grid  use,  as  indicated  in  Figure  4-5,  h  refers  to  the  size 
of  elements,  and  p  refers  to  the  order  of  the  basis.  The  top  row  demonstrates  h 
convergence,  that  is  the  same  solution  is  maintained  as  the  mesh  is  rehned.  The 
bottom  row  demonstrates  p  convergence,  that  is  the  same  solution  is  maintained  as 
the  order  of  the  basis  is  increased. 


G#3,  p=2  ,  .  G#3,  p=4 


Figure  4-7:  Temporal  convergence  of  purely  advected  flow.  The  reference  solution  is 
calculated  using  a  small  (dt=0.018)  time  step  (top  left  plot)  and  the  bottom  row  gives 
the  solution  for  larger  time  steps  using  LSRK  (left)  and  Euler  (right)  time-stepping 
schemes.  The  initial  conditions  are  plotted  on  the  top  right. 
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Tinie=0  with  bases  of  order  [3,3,3]  for  [N,P,Z] 


Nitrogen,  p=3  Phytolankton,  p=3  Zooplankton,  p=3 


XXX 


Time=100  with  bases  of  order  [3,3,3]for  [N,P,Z] 


Nitrogen,  p=3  Phytolankton,  p=3  Zooplankton,  p=3 


X  XX 


Figure  4-8:  Initial  and  final  time  step  for  NPZ  two  dimensional  test  case  using  pa¬ 
rameter  set  1  on  grid  2  with  third-order  basis  functions.  The  simulation  took  446 
seconds,  and  color  bars  shown  are  in  [/rmol]. 

basis  functions  on  grid  2  (see  Figure  4-5). 

Examining  the  solutions,  there  are  a  number  of  interesting  features.  Note  that  the 
flow  held  chosen,  combined  with  the  length  of  integration  and  the  periodic  boundary 
conditions  causes  the  huid  to  pass  over  the  obstruction  approximately  two  times, 
where  the  exact  value  depends  on  the  specihc  depth.  The  hnal  solution  is  signihcantly 
different  than  the  initial  solution  which  shows  that  the  advection  has  a  signihcant 
affect  on  the  biological  helds.  Specihcally,  the  hnal  solution  is  less  uniform  containing 
hner  structures  than  the  initial  solution.  In  particular,  there  is  a  region  of  signihcant 
Phytoplankton  growth  downstream  and  above  the  obstacle  (depth  around  -30)  that 
develops  within  the  hrst  50  days  and  is  maintained  throughout  the  simulation.  What 
the  model  is  capturing  is  nutrient-rich  water  being  brought  up  from  below  into  the 
light-penetrating  region.  These  nutrients  are  consumed  by  Phytoplankton  resulting  in 
a  “bloom.”  The  maintenance  and  stability  of  this  bloom  is  unique  to  this  parameter 
set  and  how  conditions,  and  cannot  be  expected  in  general. 

There  are  also  a  number  of  diherences  between  the  two  parameter  sets.  The  second 
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Tinie=0  with  bases  of  order  [3,3,3]  for  [N,P,Z] 

Nitrogen,  p=3  Phytolankton,  p=3  Zooplankton,  p=3 


Time=100  with  bases  of  order  [3,3,3]for  [N,P,Z] 

Nitrogen,  p=3  Phytolankton,  p=3  Zooplankton,  p=3 


X  XX 


Figure  4-9:  Initial  and  final  time  step  for  NPZ  two  dimensional  test  case  using  pa¬ 
rameter  set  2  on  grid  2  with  third-order  basis  functions.  The  simulation  took  423 
seconds,  and  color  bars  shown  are  in  [/rmol]. 


parameter  set  has  a  higher  Zooplankton  grazing  rate  (0.13  compared  to  0.1),  and  a 
lower  Zooplankton  mortality  rate  (0.06  compared  to  0.08),  which  effectively  increases 
the  amount  of  Zooplankton  in  the  system.  The  effect  of  increased  Zooplankton  can 
be  seen  in  the  Zooplankton  and  Phytoplankton  plots,  where  a  higher  concentration 
of  Zooplankton  and  a  lower  concentration  of  Phytoplankton  is  present  in  Figure  4-9 
than  in  Figure  4-8.  Therefore,  for  the  second  parameter  set.  Phytoplankton  is  being 
consumed  at  a  faster  rate  by  Zooplankton.  With  less  Phytoplankton,  less  Nutrients 
are  being  consumed,  and  more  Nutrients  remain  in  the  system.  A  particularly  inter¬ 
esting  feature  is  the  smoothly  varying  Zooplankton  held  for  parameter  set  1 .  Because 
this  held  is  more  uniform,  one  can  conceivably  gain  some  computational  efficiency  by 
decreasing  the  order  of  the  basis  used  to  calculate  this  held.  This  observation  leads  to 
the  discussion  in  the  next  section,  where  diherent  orders  of  basis  functions  are  used 
for  the  diherent  constituents. 

The  numerical  scheme  performs  as  expected.  The  current  implementation  takes 
420-450  seconds  for  2,600  integration  steps  of  22,710  degrees  of  freedom.  The  cost 
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of  the  solutions  for  higher  values  of  k  were  also  examined.  It  was  found  that  the 
explicit  schemes  became  prohibitively  expensive  for  relatively  small  values  of  k  (grid 
Peclet  numbers  <  100  and  it  was  clear  that  an  implicit  solution  would  be  necessary 
for  efficiently  dealing  with  diffusion. 

4.5.1  Variable  order  basis  functions 

As  observed  from  the  results  in  the  previous  section,  some  computational  effi¬ 
ciency  could  be  gained  by  using  different  orders  of  basis  functions  for  the  different 
constituents  on  the  same  triangulation.  What  is  being  proposed  here  is  a  p-adaptive 
numerical  scheme  across  constituents,  h-adaptivity  across  constituents,  that  is,  us¬ 
ing  different  meshes  for  different  constituents,  is  not  considered.  Normally  adaptive 
schemes  that  change  the  discretization  do  so  for  all  variables.  Often,  a  problem  with 
these  schemes  is  hnding  an  adaptation  criterion  based  on  one  of  the  variables  that 
improves  the  accuracy  of  the  solution  for  all  the  variables.  This  adaptation  criterion 
could  also  be  based  on  multiple  variables,  but  this  results  in  regions  of  the  mesh 
being  rehned  for  a  variable  that  does  not  need  refinement  in  that  region  to  improve 
the  accuracy  of  the  simulation.  The  complexity  increases  dramatically  when  larger 
systems,  such  as  a  24  constituent  biological  model,  is  used. 

What  is  being  proposed  here  is  that  each  constituent  adapts  independently  of  the 
other  two  constituents  based  on  constituent-dependent  adaptation  criteria.  In  order 
to  explore  this  new  idea,  our  MATLAB  code  used  was  extended  to  allow  different 
constituents  to  use  different  orders  of  basis  functions.  As  part  of  the  extension,  the 
code  allows  spatially  varying  orders  of  basis  to  be  used  on  the  same  mesh.  However, 
for  these  tests,  only  a  global  change  to  the  order  of  basis  function  was  made.  Also, 
since  appropriate  adaptive  criteria  for  the  biological  constituents  have  not  yet  been 
developed,  the  initially  chosen  order  of  basis  function  is  maintained  throughout  the 
simulation  time.  Eventually,  in  order  to  gain  maximum  efficiency,  the  order  of  the 
basis  function  could  be  allowed  to  change  dynamically,  spatially,  and  independently 
for  each  constituent. 

To  examine  the  feasibility  of  the  proposed  scheme,  the  test  case  for  this  section  was 
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Figure  4-10:  Final  time  step  for  two  dimensional  NPZ  test  case  using  parameter 
set  1  on  grid  2  with  third-order  bases  (top  row)  for  Nitrogen,  Phytoplankton  and 
Zooplankton,  taking  446  seconds.  The  projection  of  this  solution  onto  hrst  order 
bases  is  provided  on  the  bottom  row  for  comparison  with  the  reference  solution. 
Color  bars  shown  are  in  [/imol]. 


run  on  grid  2  using  third  order  basis  functions  for  the  Nutrient  and  Phytoplankton 
fields,  and  varying  the  order  of  the  basis  for  the  Zooplankton  fields.  The  results  of 
these  simulations  are  plotted  in  Figures  4-10  to  4-12.  Note  that  the  fields  are  plotted 
both  using  the  higher-order  bases  as  well  as  the  first  order  basis.  The  first-order  basis 
fields  are  found  by  interpolating  from  the  higher-order  basis,  and  this  allows  better 
comparison  of  the  results.  Also,  for  reference,  the  solution  using  hrst  order  basis 
functions  for  all  constituents  is  plotted  in  Figure  4-13.  Finally,  the  difference  between 
the  helds  in  Figure  4-10  and  Figure  4-11  are  plotted  in  Figure  4-14,  and  the  difference 
between  the  helds  in  Figure  4-10  and  Figure  4-12  are  plotted  in  Figure  4-15. 

Qualitatively  examining  the  solutions  for  the  biology,  the  Zooplankton  solutions 
look  the  same  regardless  of  the  order  of  basis  used  for  this  study.  Also,  the  major 
features  for  the  Nitrogen  and  Phytoplankton  helds  remain  qualitatively  similar.  What 
is  not  obvious  from  the  plots  is  the  shift  in  the  vertical  position  of  the  Phytoplankton 
bloom.  This  is  highlighted  by  the  diherence  plots  (Figures  4-14  and  4-15)  where 
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Time=100  with  bases  of  order  [3,3,2]for  [N,P,Z] 


Nitrogen,  p-3  Phytolankton,  p=3  Zooplankton,  p=2 
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Figure  4-11:  Final  time  step  for  two  dimensional  NPZ  test  case  using  parameter 
set  1  on  grid  2  with  third-order  bases  (top  row)  for  Nitrogen  and  Phytoplankton, 
and  second  order  basis  for  Zooplankton,  taking  357  seconds.  The  projection  of  this 
solution  onto  hrst  order  bases  is  provided  on  the  bottom  row  for  comparison  with  the 
reference  solution.  Color  bars  shown  are  in  [/imol]. 
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Figure  4-12:  Final  time  step  for  two  dimensional  NPZ  test  case  using  parameter  set  1 
on  grid  2  with  third-order  bases  (top  row)  for  Nitrogen  and  Phytoplankton,  and  hrst 
order  basis  for  Zooplankton,  taking  296  seconds.  The  projection  of  this  solution  onto 
hrst  order  bases  is  provided  on  the  bottom  row  for  comparison  with  the  reference 
solution.  Color  bars  shown  are  in  [/rmol]. 
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Figure  4-13:  Final  time  step  for  two  dimensional  NPZ  test  case  using  parameter  set  1 
on  grid  2  with  hrst-order  bases  for  Nitrogen,  Phytoplankton  and  Zooplankton,  taking 
47  seconds.  Color  bars  shown  are  in  [/rmol]. 
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Figure  4-14:  Difference  between  fields  calculated  using  a  third  order  basis  for  Zoo¬ 
plankton  minus  using  a  second  order  basis  for  Zooplankton.  Note  the  top  right  plot 
is  a  projection  of  the  difference  onto  a  first  order  basis  for  Zooplankton.  Nitrogen 
and  Phytoplankton  both  use  third  order  bases,  and  the  difference  projected  onto  first 
order  bases  is  plotted  on  the  bottom  row.  Color  bars  shown  are  in  [/imol]. 
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Figure  4-15:  Difference  between  helds  calculated  using  a  third  order  basis  for  Zoo¬ 
plankton  minus  using  a  hrst  order  basis  for  Zooplankton.  Nitrogen  and  Phytoplankton 
both  use  third  order  bases  and  the  difference  projected  onto  hrst  order  bases  is  plotted 
on  the  bottom  row.  Color  bars  shown  are  in  [/imol] 
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the  phase  error  causes  a  large  difference  in  the  solutions  compared  with  the  highest 
order  basis  function  solution.  Part  of  this  error  is  due  to  the  velocity  held;  for  these 
simulations,  the  velocity  held  is  solved  on  the  grid  that  the  constituent  uses,  which 
means  the  Zooplankton  simulation  uses  a  lower  accuracy  velocity  held.  The  same 
phase  error  is  present  in  the  Nitrogen  held.  Apart  from  the  phase  error,  the  shape  of 
the  features  in  the  Nitrogen  and  Phytoplankton  helds  are  also  diherent.  In  particular, 
a  small  high  concentration  Phytoplankton  region  below  the  main  bloom  is  not  present 
when  the  lowest  order  basis  is  used  for  Zooplankton.  Qualitatively  then,  the  solutions 
are  similar  enough  to  further  examine  the  possibility  of  this  type  of  adaptive  scheme. 

Examining  the  diherence  plots  (Figures  4-14  and  4-15),  the  largest  diherence  in 
the  Nitrogen  and  Phytoplankton  helds  are  in  narrow  regions.  This  is  encouraging, 
because  it  suggests  that  using  higher  order  bases  for  Zooplankton  in  the  small  banded 
regions  of  highest  error  may  improve  the  solution.  Rehning  the  solution  in  the  narrow 
bands  may  not  improve  the  accnracy  because  it  is  possible  that  the  error  originates 
elsewhere  and  grows,  showing  up  in  the  narrow  bands.  An  adjoint-type  rehnement 
metric  would  account  for  the  linearized  part  of  this  event.  Nonetheless,  the  diherence 
between  the  third  and  second  order  basis  used  for  Zooplankton  is  smaller  than  the 
diherence  between  the  third  and  hrst  order  basis  used  for  Zooplankton,  as  expected. 
Thns,  rehnement  of  the  mesh  somewhere  will  improve  the  accuracy  of  the  solution, 
and  due  to  the  local  nature  of  the  errors,  local  rehnement  should  be  sufficient. 

Considering  the  simulation  time,  efficiency  is  gained  by  reducing  the  order  of  the 
basis  for  Zooplankton.  A  20%  gain  is  realized  by  decreasing  the  order  from  three  to 
two  (calculated  using  100%-  (T3  — T2)/T3),  wheras  a  33%  gain  is  realized  by  decreasing 
the  order  from  three  to  one.  This  translates  into  signihcant  savings,  which  should 
not  necessarily  be  expected.  Part  of  the  efficiency  of  qnadrature-free  implementations 
comes  from  eliminating  the  need  to  interpolate.  With  one  constitnent  no  longer  on 
the  same  grid  as  the  other  two,  interpolation  is  reqnired  for  calculating  the  value 
of  the  source  terms.  Therefore,  by  reducing  the  order  of  the  basis,  computational 
efficiency  is  gained  because  fewer  degrees  of  freedom  exist  and  matrix  operators  are 
samller,  bnt  an  additional  interpolation  cost  is  introduced.  If  local  p-adaption  is  used 
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within  the  mesh  for  a  single  consitutent,  that  is,  various  orders  of  bases  are  used  for 
the  same  constituent,  an  additional  edge  interpolation  is  introduced  for  quadrature- 
free  implementations.  While  not  as  expensive  as  the  volumetric  interpolations,  this 
cost  does  effect  the  decision  to  adapt  the  order  of  the  basis  according  to  the  order 
of  neighboring  elements.  It  is  conceivable  that  decreasing  the  order  of  a  basis  in  an 
element  could  increase  computational  cost  because  if  the  element  is  surrounded  by 
neighbouring  elements  of  equal  and  different  order,  the  introduced  interpolation  cost 
could  overwhelm  the  efficiency  gain  from  the  lower  basis. 

In  order  to  examine  when  it  would  be  efficient  to  decrease  the  order  of  a  basis 
function,  a  detailed  operation  count  for  the  current  implementation  was  conducted. 
It  was  found  that  adapting  from  a  high  order  basis  to  a  first  order  basis  is  always  more 
efficient.  However,  it  was  calculated  that  if  a  single  element  in  a  three-constituent 
model  discretized  uniformly  with  10**^  order  basis  functions  is  adapted  down  to  a  9**^ 
order  basis,  the  calculations  associated  with  that  element  would  increase  by  a  factor 
on  the  order  of  40%.  Hence,  it  would  be  considerably  more  expensive. 

The  operation  count  emphasized  which  operations  are  crucial  for  maximizing  the 
numerical  efficiency.  Whether  or  not  it  is  efficient  to  reduce  the  order  of  a  basis  for  a 
constituent  on  a  single  element  depends  on:  the  number  and  order  of  basis  of  the  other 
constituents  on  that  element;  and  the  order  of  the  bases  of  the  surrounding  elements 
for  the  same  constituent.  Thus,  if  the  adaptation  criterion  determines  that  an  element 
for  a  particular  constituent  can  be  of  lower  order  without  sacrificing  accuracy,  the 
scheme  needs  to  ensure  that  efficiency  is  also  gained  by  the  adaptation  before  adapting 
the  basis. 

Also,  for  an  advanced  treatment,  the  decision  to  adapt  a  group  of  constituents  or 
a  group  of  elements  could  be  coupled.  That  is,  for  example,  if  only  one  constituent 
adapts,  it  may  be  more  expensive;  however  if  all  the  constituents  adapt,  efficiency 
can  be  gained.  If  the  adaptation  criteria  only  considers  the  cost  of  one  constituent 
adapting  at  a  time,  then  no  adaptation  may  take  place,  whereas  if  the  adaptation 
criterion  considers  the  cost  of  simultaneous  adaptation  of  constituents,  adaptation 
may  be  efficient  and  should  take  place.  Therefore,  the  most  efficient  adaptation 
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criteria  would  consider  the  cost  of  simultaneous  constituent  and  element  adaptation. 

The  cost  considerations  that  we  found  thus  far  have  applied  to  quadrature-free 
implementations,  and  do  not  apply  in  the  same  way  for  quadrature-based  implemen¬ 
tations.  Because  interpolation  to  gauss  points  is  required  in  quadrature-based  im¬ 
plementations,  the  introduced  volume  and  edge  interpolation  will  not  add  signihcant 
cost  to  the  scheme,  but  will  mainly  contribute  to  the  complexity  of  the  code. 

Finally,  comparing  the  solution  with  uniformly  third  order  basis  functions  to  the 
solution  with  uniformly  hrst  order  basis  functions,  the  need  for  an  adaptive  scheme  is 
highlighted.  The  small  region  of  high  Phytoplankton  concentration  below  the  main 
Phytoplankton  bloom  to  the  right  of  the  bathymetry  is  not  detectable  in  the  simula¬ 
tion  using  only  hrst  order  basis  functions.  While  the  solutions  do  appear  similar,  the 
helds  are  not  as  smooth  as  the  coarser  solution  would  suggest.  Properly  implemented 
adaptive  schemes  would  rehne  the  solution  locally  so  that  important  small-scale  fea¬ 
tures  will  be  resolved  whereas  a  non-adaptive  lower  resolution  scheme  would  not 
resolve  these  features  and  they  would  be  missed.  Therefore,  for  improved  accuracy 
of  simulations,  adaptive  schemes  are  crucial. 


4.6  Conclusions  and  recommendations 

Temporal,  h  and  p  convergence  was  demonstrated  for  each  part  of  the  solution 
of  the  ADR  equations.  It  was  found  that  the  LSRK  integration  scheme  performed 
better  than  the  hrst  order  Euler  scheme  both  for  integrating  the  biological  source 
terms  and  the  full  ADR  equations. 

From  the  one-dimensional  source  terms  tests,  it  was  found  that  the  second  param¬ 
eter  set  (higher  Zooplankton  grazing  rate  and  lower  Zooplankton  death  rate)  had  a 
hner-scale  structure  in  the  vertical  direction.  This  hnding  was  repeated  with  the  full 
solution  of  the  ADR  equations. 

The  importance  of  comparing  higher  order  to  lower  order  schemes  on  an  efficiency- 
accuracy  basis  was  highlighted  through  a  purely  advective  test  case.  The  higher  order 
scheme  was  more  accurate,  more  efficient,  and  used  fewer  degrees  of  freedom  than  the 
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lower  order  scheme. 

Using  explicit  time  integration  schemes  with  a  quadrature-free  method  resulted  in 
an  efficient  numerical  scheme.  However,  it  was  found  that  explicit  time  integration  of 
diffusive  terms  was  prohibitively  expensive  for  grid  Peclet  number  smaller  than  100 
(or  large  values  of  k,). 

Due  to  the  uniformity  of  the  solution  for  Zooplankton  with  the  hrst  parameter  set, 
the  use  of  a  p-adaptive  scheme  (changing  only  the  order  of  the  basis  function)  across 
constituents  was  examined.  It  was  found  that  such  an  adaptation  scheme  is  promising 
for  improving  efficiency  and  accuracy  of  the  solution,  however  an  operation  count 
showed  that  numerical  cost  considerations  need  to  be  made.  Specihcally,  additional 
volume  and  edge  interpolation  operations  result  with  p  adaptation,  and  may  increase 
the  cost  of  a  quadrature-free  implementation  even  when  the  adaptation  reduces  the 
order  of  the  basis  on  an  element. 

Finally,  the  need  for  adaptive  algorithms  was  highlighted  by  noting  that  an  im¬ 
portant  small-scale  feature  was  unresolved  in  a  coarse  discretization,  whereas  a  hner 
discretization  resolved  this  feature.  Properly  implemented  adaptive  algorithms  would 
locally  resolve  important  small  scale  features  and  gain  efficiency  with  coarse  discretiza¬ 
tions  in  regions  where  low  accuracy  is  needed. 

For  the  examples  we  considered,  it  is  recommended  that  high-order  quadrature- 
free  adaptive  algorithms  are  used  whenever  possible  for  optimum  accuracy  and  ef- 
hciency.  Also,  explicit  time  integration  is  recommended  for  advective  operators, 
whereas  implicit  time  integration  schemes  are  recommended  for  diffusive  operators 
due  to  prohibitively  expensive  numerical  stability  constraints. 
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Chapter  5 


Implicit  Solution  Techniques 


The  time  integration  of  DG  discretized  equations  is  often  achieved  by  explicit 
methods,  such  as  Runge-Kutta  (RK)  schemes  and  considerable  efficiency  is  obtained 
by  using  quadrature-free  implementations.  The  problem  with  explicit  time  integration 
is  that  the  time  step  size  is  subject  to  the  Courant-Friedrichs-Lewy  (CFL)  stability 
condition.  In  a  simulation  with  diffusion  discretized  using  the  Local  Discontinuous 
Galerkin  (LDG)  method  (Gockburn  and  Shu,  1998a),  the  stable  time  step  scales  as 
h? /p^  for  high  order  basis  functions  (Persson  and  Peraire,  2006),  where  h  is  the  char¬ 
acteristic  element  size  and  p  is  the  order  of  the  basis  function.  This  stability  criterion 
is  very  restrictive,  and  hence  implicit  schemes,  which  are  not  subject  to  the  GFL 
stability  condition,  are  desirable.  However,  implicit  methods  require  the  inversion  of 
a  large  matrix,  which  may  not  be  feasible  to  store  for  larger  problems.  Addition¬ 
ally,  to  solve  the  Incompressible  Navier  Stokes  equations  using  a  Projection  Method, 
the  inversion  of  a  large  matrix  is  also  required.  This  section  focuses  on  hnding  an 
appropriate  iterative  solver  and  preconditioner  combination  to  solve  linear  advection- 
diffusion  equations  implicitly,  with  an  emphasis  on  the  preconditioner,  but  this  work 
also  enables  the  solution  of  more  complicated  equations  such  as  the  Incompressible 
Navier  Stokes  equations. 
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5.1  Review  of  solvers  utilized  for  DG  Schemes 


DG  schemes  are  often  integrated  in  time  using  explicit  schemes,  and  in  a  series  of 
papers,  Cockburn  et  al.  (1990)  introduced  and  analyzed  their  explicit  Runge-Kutta 
DG  (RKDG)  scheme.  The  RKDG  scheme  has  been  used  with  considerable  success. 
Since  then.  Strong  Stability  Preserving  (SSP)  RK  schemes  have  been  developed  (Got¬ 
tlieb  et  ah,  2001).  Explicit  schemes  make  use  of  efficient  matrix-vector  multiplica¬ 
tions,  and  are  often  used  in  practice.  However,  for  large  three-dimensional  problems 
with  widely  ranging  space  and  time  scales,  it  is  computationally  efficient  to  use  im¬ 
plicit  time  integration  schemes  when  the  GFL  condition  is  too  stringent.  Some  mixed 
implicit /explicit  strategies  have  been  suggested  (Ascher  et  al.,  1997,  Kennedy  and 
Garpenter,  2003),  but  the  remainder  of  this  chapter  focuses  on  implicit  schemes. 

For  high  computational  efficiency  and  lower  memory  usage,  an  efficient  data- 
structure  is  necessary.  The  compressed  column  format  (Barrett  et  al.,  1994)  should 
be  avoided  in  favor  of  a  dense  block  format  which  can  minimize  cache  misses  (Persson 
and  Peraire,  2006).  Persson  and  Peraire  (2006)  store  the  block  diagonal  and  off-block- 
diagonal  entries  in  separate  arrays.  A  more  sophisticated  storage  strategy  is  required 
for  the  LDG  discretization  of  the  diffusive  terms,  since  the  hnal  matrix  has  additional 
non-zero  entries.  The  more  sophisticated  strategy  involves  storage  of  a  number  of 
smaller  matrices.  However,  to  further  circumvent  the  storage  problem  associated 
with  LDG,  Peraire  and  Persson  (2007)  proposed  the  Gompact  Discontinuous  Galerkin 
(GDG)  discretization. 

A  proper  preconditioner  and  solver  is  neccessary  for  efficient  matrix  inversion. 
In  Persson  and  Peraire  (2006),  the  equations  are  discretized  using  LDG  and  solved 
using  the  Quasi-Minimum  Residual  (QMR)  method,  the  Gonjugate  Gradient  Squared 
(GGS)  method,  the  Generalized  Minimal  RESidual  (GMRES)  method,  and  restarted 
GMRES(m)  with  restart  value  m.  A  pl~ILU(0)  preconditioner  was  proposed  and 
tested.  This  preconditioner  consists  of  a  block  ILU(O)  preconditioner  used  as  a  pre¬ 
smoother  for  a  two-level  p-Multi-Grid  (MG)  scheme.  The  p-MG  scheme  works  by 
using  an  orthogonal  Koornwinder  expansion  to  project  the  residual  from  a  higher- 
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order  basis  function  solution  to  a  p  =  1  solution  (Persson  and  Peraire,  2006).  The 
correction  is  then  calculated  using  a  sparse-direct  solver  on  the  reduced  problem 
(Persson  and  Peraire,  2006).  Three  simplihed  test  problems  solving  the  compressible 
Navier-Stokes  equations  were  studied,  and  it  was  found  that  restarted  GMRES(m) 
with  m  =  20  worked  well,  especially  when  preconditioned  with  their  proposed  pl- 
ILU(O)  preconditioner  (Persson  and  Peraire,  2006).  Other  researchers  have  also  had 
success  with  p-MG  schemes  (Fidkowski  et  ah,  2005). 

Later,  Persson  and  Peraire  (2008)  solved  generalized  conservation  laws  using  a 
GDG  discretization  for  the  diffusive  terms.  In  this  paper,  Persson  and  Peraire  con¬ 
sider  several  preconditioner  options:  Block  Jacobi,  block  Gauss-Seidel  (GS),  and 
block  incomplete  LU  factorizations  with  zero  hll-in  ILU(O).  Several  other  efforts  are 
also  reviewed  in  Persson  and  Peraire  (2008),  most  making  use  of  MG  methods.  It 
was  found  that  the  solution  of  pure-advection  problems  using  an  implicit  scheme  was 
more  robust  than  the  solution  of  implicit  advection-diffusion  problems.  The  diffusive 
terms  were  not  adequately  handled  by  an  ILU(0)preconditioner,  and  it  was  shown 
that  diffusive  problems  often  required  a  MG  coarse-grid  correction  to  improve  con¬ 
vergence  and  robustness  (Persson  and  Peraire,  2008).  Persson  and  Peraire  (2008) 
found  that  the  pl-ILU(O)  preconditioner  outperformed  all  the  other  preconditioner 
options,  showing  remarkable  consistency  and  robustness  over  a  range  of  test  cases. 
GMRES  was  found  to  be  the  fastest  and  most  reliable  solver,  in  general,  at  the  cost  of 
increased  computations  and  storage  as  the  number  of  iterations  increase  Persson  and 
Peraire  (2008).  The  combination  of  the  pl-ILU(O)  preconditioner  with  the  GMRES 
solver  was  found  to  be  optimum  in  their  case. 


97 


5.2  Novel  studies  on  Solvers  and  Preconditioners 


for  DG  schemes 

5.2.1  Description  of  solvers 

The  GMRES(m)  and  QMR  solvers  are  a  part  of  MATLAB,  while  the  BiCGSTAB(/) 
algorithm  was  obtained  from  Sleijpen  (2009).  Each  have  the  following  syntax 


[X , FLAG , RELRES , ITER , RESVEC]  =  GMRES (A , B , RESTART , TOL , MAXIT , Ml , M2 , XO) 
[X,FLAG,RELRES,ITER,RESVEC]  =  QMR (A, B, TOL, MAXIT, Ml, M2, XO) 

[X, RESVEC, ITER*]  =  CGSTAB (A, B , XO , TRO , OPTIONS ,M1 ,M2) 


where  X  is  the  solution;  FLAG  contains  error  information;  RELRES  gives  the  relative 
error  at  the  final  iteration;  ITER  and  ITER*^  give  the  number  of  iterations;  RESVEC 
is  a  vector  of  convergence  history;  A  is  the  problem  matrix;  B  is  the  right-hand-side 
vector;  RESTART  is  the  number  m  for  GMRES  (m);  TOL  is  the  convergence  criteria  tol¬ 
erance;  MAXIT  is  the  maximum  number  of  iterations;  Ml  and  M2  are  the  preconditioner 
matrices  and  XO  is  the  initial  guess  vector.  The  BiGGSTAB(/)  implementation  has 
a  slightly  different,  but  similar  syntax,  the  major  difference  being  that  the  tolerance, 
maximum  iterations,  value  for  /,  and  other  options  are  passed  to  the  function  via  the 
OPTIONS  structure.  Also,  a  helper  function  was  required  to  input  function  handles 
instead  of  matrices  for  the  matrix-free  preconditioners. 

GMRES(m) 

The  Generalized  Minimum  RESidual  with  restarts  after  (m)  iterations  algorithm 
is  based  on  the  minimization  of  the  residual  =  6  —  Axn  where  Xn  G  and 
is  the  Krylov  subspace  formed  after  n  steps  of  Arnold!  iteration  (Trefethen  and  Ban, 

^The  original  function  from  Sleijpen  (2009)  had  to  be  modified  to  output  this  outer  iteration 
count 

^if  M2  is  empty,  it  is  not  an  LU-type  preconditioner 


1997).  The  norm  ||Hn|/  —  ||6||  Cl  II  is  minimized  for  y,  and  then  Xn  =  QnU  (Trefethen 
and  Ban,  1997),  where  AQ„  =  and  ei  =  (1,  0, 0).  That  is,  the  columns  of 

Q„  are  the  hrst  n  columns  of  Q  in  the  QR  factorization  of  A,  and  H„  is  the  upper 
Hessenberg  matrix  obtained  at  the  Arnoldi  iteration.  This  algorithm  works  to 
solve  the  problem  because  the  ever-increasing  size  of  the  Krylov  subspace  approaches 
the  span  of  the  columns  of  A.  Minimizing  the  residual  in  the  subspace  of  A  is  the 
same  as  solving  the  problem  exactly,  and  will  happen  when  n  =  rank{A),  if  A  has 
full  rank.  The  solution  procedure  orders  A  according  to  its  dominant  eigenvalues, 
hence,  its  dominant  reduced-rank  (generalized  inverse)  component. 

GMRES  becomes  more  expensive  both  in  terms  of  memory  and  computation  as 
n  becomes  large,  hence  restarted  GMRES (m)  is  often  used,  where  the  algorithm 
is  restarted  using  Xm  as  the  initial  guess  after  m  iterations.  Where  the  conver¬ 
gence  of  GMRES  is  monotonic  with  increasing  iterations  (Trefethen  and  Ban,  1997), 
GMRES (m)  may  stagnate  (Persson  and  Peraire,  2008).  Each  iteration  of  GMRES (m) 
requires  one  matrix- vector  multiplication,  but  becomes  more  expensive  as  m  increases. 

QMR 

The  Quasi-Minimum  Residual  method  is  also  a  Krylov  subspace  minimum  residual 
method,  but  the  Krylov  subspace  is  different  in  this  case.  QMR  is  based  on  tridiagonal 
biorthogonalization  methods.  Here  the  A  matrix  is  factored  into  A  =  VTV^^,  where 
T  is  tri-diagonal,  and  V  is  non-singular  (but  not  unitary)  and  the  columns  of  V  are 
orthogonal  to  the  columns  of  W  =  (V“^)*.  The  Lanczos-like  equations  that  follow 
are 


AV„  —  V„+iT„ 

(5.1) 

A*W„  = 

(5.2) 

T„  =  s;  =  w:av„ 

(5.3) 

where  V„,  are  m  x  n,  T„  and  S„  are  (n  -|-  1)  x  n  non-hermitian  tri-diagonal 
matrices,  and  and  are  the  upper  n  x  n  blocks  of  and  Sn  respectively 
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(Trefethen  and  Ban,  1997).  From  these  equations  a  three-term  recurrence  relation 
results: 


"h  '~^n,n'l^n  “1“  '^n+l,nVn+l  (^•^) 

Sn— l,n^n— 1  “1“  ^n,n^n  “1“  Sn-|-l,n^n+l  (^•^) 

Therefore,  starting  with  arbitrary  vectors  Vi,Wi  such  that  v^Wi  =  1,  and  setting 
Ti^2  =  T2,i  =  0,  and  vq  =  wq  =  0.  Then,  for  each  n  =  1,2,...  set  Tn,n  = 
w^Avn,  determine  Vn+i,Wn+i  from  equations  5. 4-5. 5  (up  to  a  constant),  then  hnd 
Tn+i^n-i)  T^+gn+i  subject  to  the  normalization  of  again  using  5.4-5. 5.  Con¬ 
sequently,  the  subspaces  G  (ui,  Aui, ...,  and  Wn  G  (tci,  A^tci, ...,  (A*)’^“^t(;i) 

are  formed  (Trefethen  and  Ban,  1997).  QMR  chooses  the  normalized  initial  residual 
{h  —  Axo)  as  the  initial  Vi,  and  also  introduces  a  weighted  scaling  matrix  (Freund 
and  Nachtigal,  1991).  This  algorithm  uses  a  look-ahead  Lanczos  iteration  in  order  to 
avoid  near-breakdowns  ^  (Freund  and  Nachtigal,  1991).  A  disadvantage  of  this  method 
is  that  it  needs  the  computation  of  A*  (5.2),  and  subsequently  requires  also  M*  for 
preconditioner  M.  Each  iteration  of  QMR  requires  two  matrix- vector  multiplications. 

BiCGSTAB(/) 

The  BiConjugate  Gradient  STABilized  uses  an  /-degree  minimum  residual  (MR) 
polynomial  (Sleijpen  and  Fokkema,  1993)  and  is  yet  another  Krylov  subspace  mini¬ 
mum  residual  method.  This  is  a  variant  of  the  BiConjugate  Gradient  (BCG)  method 
(Trefethen  and  Ban,  1997).  In  BCG,  the  initial  Vi  is  chosen  as  b.  This  has  the  effect 
of  choosing  Xn  in  the  same  subspace  as  GMRES  (i.e.  {b,  Ab, A^~^b)),  but  min¬ 
imizing  the  residual  in  the  (tci,  A^tci, ...,  (AQn-l 

tci)  subspace  (Trefethen  and  Ban, 
1997).  The  BiCGSTAB(/)  algorithm  tries  to  both  smooth  the  convergence  rate  of 
BCG,  and  account  for  the  near-breakdown  situations  (Trefethen  and  Ban,  1997).  The 
BiCGSTAB(/)  algorithm  has  one  inner  loop  that  consists  of  two  parts.  In  the  hrst 
part,  termed  the  “BCG  part,”  new  BCG  vectors  are  computed  implicitly  by  com- 

^Near-breakdown  happen  when  vJ^j^iVn+i  ~  0,  w„+i  «  «  0 
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puting  the  iteration  coefficients  a  and  f3  explicitly  (Sleijpen  and  Fokkema,  1993).  In 
the  second  part,  termed  the  “MR  part,”  a  locally  minimum  residual  is  calculated 
using  the  Minimum  Residual  approach  (Sleijpen  and  Fokkema,  1993).  At  the  end 
of  the  inner  loop,  the  residuals  and  BCG  vectors  (including  the  current  solution)  is 
updated  (Sleijpen  and  Fokkema,  1993).  Each  outer  step  of  BiCGSTAB(/)  requires  21 
matrix-vector  multiplications,  but  becomes  more  expensive  for  higher  I  (Sleijpen  and 
Fokkema,  1993).  More  information  can  be  obtained  from  Sleijpen  (2009)  and  Sleijpen 
and  Fokkema  (1993). 

5.2.2  Description  of  DG-specific  Preconditioners 

A  preconditioner,  M,  is  a  matrix  that  approximates  A  and  is  easy  to  invert.  Left 
preconditioning  involves  left  multiplication  of  to  the  original  linear  system.  That 
is.  Ax  =  b  becomes  M“^Ax  =  For  M  =  A,  M“^Ax  =  Ix  =  x  =  A~^b,  the 

problem  is  solved  exactly,  and  for  M  =  I,  the  original  problem  remains. 


Block  Jacobi 

One  of  the  most  important  preconditioners  used  in  practice  is  the  Jacobi  pre¬ 
conditioner  (Trefethen  and  Ban,  1997).  Here  the  preconditioner  M  is  taken  as  the 
diagonal  entries  of  the  A  matrix.  This  is  easily  inverted,  and  for  some  problems  it  can 
give  signihcant  improvements  in  computational  time.  For  DG-discretized  systems,  a 
block-Jacobi  preconditioner  may  be  used,  where  M  is  taken  as  the  Np  x  Np  blocks 
on  the  diagonal  of  A.  It  is  more  expensive  to  invert  this  M,  but  can  be  done  on 
an  individual  block-by-block  basis.  The  expense  of  this  inversion  becomes  non-trivial 
when  using  higher-order  basis  functions.  In  Persson  and  Peraire  (2008),  the  cost  for 
computing  a  Jacobi  factorization  (including  the  cost  of  computing  the  matrix  A)  is 
given  as  {2/3)NpNt  where  Nt  is  the  number  of  elements  in  the  discretization,  and  Np 
is  the  number  of  degrees  of  freedom  in  an  element.  The  cost  of  M“^x  is  also  given  in 
Persson  and  Peraire  (2008)  as  {2)NpNt. 
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Block  Gauss-Seidel 


The  Block  Gauss-Seidel  (GS)  preconditioner  is  similar  to  the  Block  Jacobi  pre¬ 
conditioner,  but  keeps  both  the  diagonal  blocks  and  all  the  lower  (or  upper)  blocks. 
For  some  specihc  problems,  usually  pure  advection,  the  GS  preconditioner  has  been 
shown  to  perform  signihcantly  better  than  the  Jacobi  preconditioner,  but  for  gen¬ 
eral  problems  only  a  marginal  improvement  can  be  expected  (Persson  and  Peraire, 
2008).  The  inversion  of  this  matrix  is  difficult  to  obtain  directly,  but  by  using  a  block 
back-solve,  the  effect  of  the  inverse  can  be  obtained.  The  cost  for  computing  the 
preconditioner  is  again  given  from  Persson  and  Peraire  (2008)  as  {2/3)NpNt,  and  the 
cost  of  is  given  as  {D  -|-  3)NpNt,  where  D  is  the  dimension  of  the  problem. 


ILU(O) 

In  general,  when  computing  the  LU  factorization  of  a  sparse  matrix,  the  sparsity 
of  the  matrix  is  lost  in  the  computed  L  and  U  matrices.  That  is,  L  and  U  are  more 
dense  and  require  additional  storage.  For  large  systems  of  equations,  this  additional 
storage  is  prohibitive,  not  to  mention  the  cost  of  essentially  directly  computing  the 
inverse  of  A.  The  incomplete  LU  factorization  with  zero  £ll-in  calculates  the  LU 
factorization  of  the  matrix  A,  but  does  not  allow  the  newly  computed  L  and  U 
matrices  to  have  a  sparsity  pattern  different  from  that  of  A.  Not  only  does  this 
reduce  the  storage  requirements,  but  also  the  computational  cost.  Also,  with  a  DG 
discretization,  making  additional  assumptions  about  the  mesh  can  further  reduce  the 
computational  cost  of  the  ILU(O)  factorization  (Persson  and  Peraire,  2008).  The  cost 
for  computing  the  ILU(O)  preconditioner  is  greater  than  that  of  the  Jacobi  and  GS 
preconditioners,  and  is  given  from  Persson  and  Peraire  (2008)  as  {2D+8/3)NpNt.  The 
computation  of  is  also  given  in  Persson  and  Peraire  (2008)  as  {2D  +  4)NpNt, 

which  is  the  same  as  calculating  the  matrix-vector  product  Ax. 
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j9-Multi-Grid  (MG) 

MG  preconditioners  are  typically  good  at  handling  low-frequency  components 
of  the  original  problem,  leaving  the  high-frequencies  to  be  solved  by  other  means 
(Trefethen  and  Ban,  1997).  MG  preconditioners  are  usually  calculated  by  solving  a 
hue-grid  problem  on  a  coarser  grid,  and  then  transferring  the  coarse-grid  solution  back 
onto  the  hue-grid  (Trefethen  and  Ban,  1997).  This  transferral  is  trivial  for  structured- 
grid  discretizations,  but  for  unstructured  grids,  the  transferral  of  the  solution  is  less 
straight-forward.  Fortunately,  for  DG  discretizations  with  high-order  basis  functions, 
an  alternative  exists.  Instead  of  projecting  the  solution  onto  a  coarse  grid,  the  solution 
can  be  projected  onto  a  lower-order  basis  function.  This  method  is  used  successfully 
by  a  number  of  researchers,  including  Persson  and  Peraire  (2008).  The  cost  for  this 
preconditioner  is  not  given  in  Persson  and  Peraire  (2008),  but  depends  on  the  order 
of  basis  function  [p),  the  number  of  elements  Nt,  and  the  solution  method  (usually 
sparse-direct)  used  for  the  coarse  grid. 

5.3  Results 

Equation  3.30  is  discretized  using  a  nodal  (DG)  FEM  scheme  with  appropriate 
boundary  conditions  as  discussed  in  Ghapter  3.  The  exact  problem  being  solved  is 
discussed  in  detail  in  Ghapter  4.  The  resultant  discretized  system  of  equations  gives 
rise  to  a  block  matrix  structure,  each  block  being  associated  with  a  single  element. 
Due  to  the  dehnition  of  the  fluxes  Finv  and  there  are  also  off-block-diagonal 
entries  in  the  matrix.  The  number  of  off-diagonal  entries  will  depend  on  the  type  of 
discretization  used  for  the  viscous  terms,  where  an  LDG  discretization  will  lead  to  a 
maximum  of  nine  off-diagonal  block  entries,  a  GDG  or  IP  disretization  will  lead  to 
a  maximum  of  three  off-diagonal  block  entries,  and  an  HDG  discretization  will  lead 
to  a  smaller  system  with  smaller  blocks  and  a  maximum  of  four  off-diagonal  block 
entries.  In  all  cases,  for  a  large  number  of  elements,  the  final  matrix  will  be  reasonably 
sparse.  An  example  of  the  sparsity  pattern  for  both  an  8-element  and  104-element 
discretization  on  a  structured  triangular  grid  using  the  LDG  fluxes  is  shown  in  Figure 
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5-1. 


8  Elements 


Figure  5-1:  Matrix  sparsity  patterns  for  fourth  order  {p  =  4)  basis  functions  with  8 
(left),  and  104  (right)  elements 

The  size  of  the  blocks  depend  on  the  order,  p,  of  the  basis  functions,  the  type 
of  element  (triangular  or  quadrilateral),  and  the  dimension  of  the  problem.  The 
number  of  unknowns  Np  in  a  two-dimensional  triangular  element  scales  as  Np  = 
\[{p  -|-  l)(p  -|-  2)],  while  the  number  of  unknowns  on  each  edge  scales  as  Nfp  =  (p-|-  1) 
for  one-dimensional  edge  elements.  As  an  example,  different  order  triangular  elements 
are  plotted  in  Figure  5-2. 


104  Elements 


Figure  5-2:  Location  of  unknowns  on  master  triangle  for  various  order  (p)  of  basis 
functions 
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The  MATLAB  implementations  are  briefly  discussed  below.  As  a  cautionary  note, 
the  efficiency  of  the  various  schemes  do  not  reflect  a  realistic,  optimized  implemen¬ 
tation.  Some  effort  was  expended  in  order  to  compare  the  cost  of  different  precon¬ 
ditioners,  but  the  main  criteria  for  preconditioner  selection  should  be  the  iteration 
count. 

5.3.1  Constructing  the  A  Matrix 

An  implementation  for  explicit  time  integration  can  be  used  directly  by  iterative 
matrix  solvers  because  the  explicit  implementation  essentially  provides  an  Ax  matrix- 
vector  multiplication.  Nonetheless,  for  the  purposes  of  this  study  it  was  convenient 
to  have  the  matrix  available  for  forming  the  various  preconditioners.  Also,  with  the 
matrix  available,  various  properties  of  the  matrix,  including  the  sparsity  pattern  and 
eigenvalues  could  be  easily  examined.  Thus,  A  is  formed  in  all  cases  for  this  study. 
However,  in  practice  a  DG-speciflc  matrix-free  implementation  is  desirable,  since  it 
is  possible  to  take  advantage  of  the  structure  of  the  DG-speciflc  matrix  to  improve 
efficiency  and  reduce  storage. 

5.3.2  Preconditioners 

A  number  of  preconditioners  were  tested.  Most  were  implemented  ourselves,  while 
the  MATLAB  implementation  of  the  ILU(O)  preconditioner  was  used.  The  following 
preconditioners  were  tested: 

1.  None:  M  =  I. 

2.  Upper:  M=triu(A) ,  this  is  the  non-block  version  of  the  Block  GS  preconditioner, 
and  uses  the  upper  triangle  of  A 

3.  Lower:  M=tril(A),  this  is  the  non-block  version  of  the  Block  GS  preconditioner 
and  uses  the  lower  triangle  of  A 

4.  Jacobi:  M=diag(diag(A) ) ,  this  is  the  classical  Jacobi  preconditioner 
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5.  ILU(O);  [Ml  M2]  =ilu (A, setup. type=' nofill’),  this  is  the  ILU(O)  implemen¬ 
tation  in  MATLAB 

6.  Block  ILU(O):  This  is  essentially  the  same  as  the  Block  Jacobi  preconditioner, 
but  the  LU  factorization  is  used  to  compare  the  computation  time  of  various 
implementations. 

7.  Block  Jacobi;  This  preconditioner  is  described  in  §5.2.2.  Two  implementations 
are  used,  and  are  described  in  below. 

8.  Block  GS:  This  preconditioner  is  described  in  §5.2.2.  Three  implementations  of 
this  preconditioner  were  used,  and  are  described  below. 

9.  MG:  This  preconditioner  is  described  in  §5.2.2,  and  the  implementation  is  dis¬ 
cussed  below. 

5.3.3  Block  Jacobi  Preconditioner 

Essentially  three  implementations  of  this  preconditioner  was  used,  and  it  was 
expected  that  the  same  number  of  iterations  would  result  for  the  different  implemen¬ 
tations,  while  the  computational  time  would  differ.  This  enabled  the  comparison  of 
computation  time  for  the  different  implementations.  Since  neither  an  efficient  Block 
GS  nor  p-MG  implementation  existed  in  MATLAB,  having  a  similarly-implemented 
Block  Jacobi  algorithm  allows  the  comparison  between  preconditioners  based  on  the 
number  of  iterations  with  the  ability  to  extrapolate  that  result  to  the  computational 
time  of  an  efficient  implementation  of  the  algorithm. 

The  hrst  two  implementations  were  expected  to  have  similar  computational  time. 
In  the  first  “BlockILU”  implementation,  the  LU  factorization  of  the  block  diagonals 
were  computed,  and  supplied  to  the  solver  functions  as  Ml  and  M2  respectively  (see 
§5.2.1  for  the  solver  syntax,  and  the  meaning  of  Ml,  M2).  The  second  “BlockJacobi” 
implementation  simply  supplied  the  Block  Jacobi  matrix  as  M“^a;  left  empty.  The 
third  “BlockJacobi2”  implementation  passed  a  “function  handle”  instead  of  a  matrix 
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to  the  solver  functions.  The  function  passed  in  computes  the  product  and  re¬ 

turns  X.  For  this  implementation,  was  pre-comp nted  block-by-block  and  passed 
to  the  function.  Hence,  the  only  compntational  expense  was  due  to  the  matrix-vector 
multiplication. 

5.3.4  Block  Gauss-Seidel  Preconditioner 

The  hrst  MATLAB  implementation  of  the  Block  GS  preconditioner  is  similar  to 
the  second  implementation  of  the  Block  Jacobi  preconditioner,  that  is.  Ml  is  snpplied 
to  the  solver  with  M2  left  blank.  Here  Ml  contains  the  lower  triangnlar  blocks  of 
A  (that  is,  the  lower  triangle  of  A  including  the  the  upper  portions  of  the  block- 
diagonal  entries).  This  implementation  was  fonnd  to  run  prohibitively  slowly,  but 
was  useful  for  debugging  the  other  implementations.  The  main  difficulty  with  this 
preconditioner  was  obtaining  an  efficient  implementation  for  testing  purposes.  A 
number  of  implementation  were  attempted,  and  these  are  described  below. 

The  hrst  attempt  at  a  more  efficient  implementation  was  similar  to  the  third  Block 
Jacobi  implementation,  that  is  a  fnnction  that  performs  the  Ml“^a;  was  passed  to  the 
solver.  Here,  the  inverse  of  the  blocks  on  the  diagonal  are  precompnted  and  stored 
in  Ml,  and  the  MATLAB  code  is  as  follows: 


1  x(l:Np)=Ml(l:Np,l:Np)*x(l:Np) ; 

2  for  k=2:Nt 

3  range=(k-l)*Np+l :k*Np 

4  X (range) =M1 (range , 1 : Np) * (  x (range) .. . 

5  +A(range , 1 : (k-1) *Np) *x(l : (k-1) *Np)  ); 

6  end 


Unfortnnately,  this  implementation  took  even  longer  to  rnn  than  the  original. 
After  writing  a  benchmarking  script  it  was  identihed  that  line  5  of  the  above  algorithm 
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was  responsible.  This  operation  consisted  of  the  multiplication  of  the  sparse  lower- 
diagonal  blocks  with  the  newly-solved-for  vector.  Even  though  the  matrix  was  small, 
it  was  a  sparse-matrix  multiply,  which  has  some  associated  overhead  in  MATLAB 
causing  the  computation  to  slow  considerably.  Coding  this  algorithm  in  C  and  using 
it  in  MATLAB  through  the  mex  interface  also  did  not  prove  helpful.  Since  a  sparse 
multiplication  was  not  used  in  the  C  implementation,  this  code  still  ran  prohibitively 
slowly.  To  partly  overcome  the  overhead  problem  in  the  MATLAB  implementation 
an  if  statement  was  included  to  only  do  the  multiplication  when  necessary.  The 
modihcations  to  the  algorithm  are  as  follows: 


1  x(l:Np)=Ml(l:Np,l:Np)*x(l:Np) ; 

2  for  k=2:Nt 

3  range=(k-l)*Np+l :k*Np 

3.1  if  nnz(A(range(l) , 1 : (k-l)*Np) 

4  X (range) =M1 (range , 1 :Np) * (  x (range) .. . 

5  +A(range,l: (k-l)*Np)*x(l: (k-l)*Np)  ); 

5.1  else 

5 . 2  X (range) =M1 (range, 1 : Np) *x (range) ; 

5 . 3  end 

6  end 


Unfortunately,  the  overhead  of  the  if  statement  and  the  nnz  function  was  nearly 
the  same  as  the  overhead  of  the  sparse-multiply,  so  there  were  only  marginal  savings. 
This  implementation  was  tested  but  never  used  since  it  still  ran  prohibitively  slowly. 
In  the  next  attempt,  instead  of  using  the  sparse  entries  of  A,  the  off  block  diagonal 
entries  were  reconstructed  using  the  DG  operators,  and  is  as  follows: 


1  x(l:Np)=Ml(l:Np,l:Np)*x(l:Np) ; 

2  for  k=2:K 
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3 


range=(k-l) *Np+l :k*Np 


4  X (range) =M1 (range , 1 :Np) * (  x (range) .. . 

5  +LIFT* (Scale (range) .*x(vmapP (range))  ); 

6  end 


Here  LIFT,  Scale,  and  vmapP  are  a  matrix  operator,  a  scaling-factor  array,  and 
an  index-of-neighboring-nodes  array  respectively.  Essentially,  instead  of  invoking  the 
sparse-matrix  multiplication  routines,  this  algorithm  rebuilds  the  entries  of  the  ma¬ 
trix,  while  collects  only  the  necessary  data  (using  vmapP).  This  results  in  faster, 
dense- matrix  multiplications.  This  implementation,  “blockGS,”  only  includes  advec- 
tive  terms,  since  the  inclusion  of  diffusive  terms  requires  signihcant  re-coding  due  to 
the  intermediate  q  variable,  and  in  practice  would  be  implemented  along  with  the  func¬ 
tion  performing  the  matrix-free.  Ax  multiplication.  This  Block  GS  preconditioner  is 
expected  to  perform  best  for  advection-dominated  flows  (Persson  and  Peraire,  2008), 
improving  the  rate  of  convergence  more  than  the  pure  Jacobi  preconditioner.  The  de¬ 
crease  in  run-time  for  the  new  implementation  is  substantial.  From  0(10“^)-0(10“"^) 
to  0(10“^)  seconds  per  multiplication,  giving  a  10-100  fold  decrease  in  computational 
time.  Hence,  this  type  of  implementation  should  be  used  for  realistic  applications. 

With  the  Block  GS  implementation,  it  was  also  possible  to  include  the  flux  con¬ 
tributions  of  the  un-updated  x  vector.  However,  leaving  the  flux  contributions  of  the 
un-updated  x  vector  resulted  in  poorer  performance,  hence  they  were  removed  by  a 
switch  (incorporated  through  the  scale-factor  array). 

Finally,  in  order  to  also  include  the  diffusive  effects  with  the  block  GS  precon¬ 
ditioner,  a  hnal  implementation  where  the  entire  lower-block  matrix  is  pre-inverted 
was  utilized.  This  was  the  standard  implementation  used  for  these  studies,  and  all 
numbers  reported  use  this  implementation. 
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5.3.5  J9-MG  Preconditioner 


The  first  p-MG  implementation,  “MG,”  was  similar  to  the  third  implementations 
of  both  the  Jacobi  and  GS  preconditioners.  A  function  computing  was  passed 

to  the  solver.  Here  =  lZi_yphyi~^^lZph_yi,  where  TZph_yiX  restricts  x  from 

a  higher  order  basis  function  [p  =  p^)  to  a  hrst  order  basis  function  {p  =  1),  and 
_yphX  prolongate  the  solution.  In  this  implementation,  was  pre-computed, 

so  the  only  cost  of  the  calculations  were  due  to  the  restriction/prolongation 

and  matrix-vector  multiplications. 

Note,  the  projection/restriction  operators  are  normally  dehned  for  a  modal  basis 
function,  where  u{x)  =  The  restriction  operator  is  then  easily  dehned 

by  simply  truncating  the  number  of  modes.  This  is  different  from  an  interpolation, 
and  caution  is  required  when  implementing  a  p-MG  scheme  when  using  a  nodal  basis. 

The  nodal  basis  used  for  this  work  can  be  presented  as  Vu^  =  ,  where  = 

ipjixi)  is  a  generalized  Vandermonde  matrix,  and  is  the  approximate  value  of  u 
at  the  nodal  points  Xi.  Interpolation  of  the  solution  works  as  follows: 

<  =  Vniu"^  =  VmiV^^u^ 
uf  = 

=  Viivwf  =  ViivVr^<  (5.6) 

where  (yN)hj  =  (VAri)p'  =  {Vi)ik  =  (yiN)jk  =  'ipk{xf), 

h  =  j  =  i  =  1,2,3,  and  k  =  1,2,3.  Here  x^  represents  the  nodal 

points  for  the  p  =  1  basis,  and  x^  represents  the  nodal  points  for  the  p  =  N  basis, 
and  the  subscript  (.)i  on  u  indicates  the  approximate  solution  on  the  p  =  1  basis. 

Interpolation  is  NOT  the  same  as  restriction/prolongation,  however,  because  the 
interpolant  fails  to  remove  the  modes  that  cannot  be  solved  for  on  the  coarser  p  =  1 
grid.  That  is,  in  the  interpolating  case,  the  higher  order  basis  is  not  contained  in  the 
lower  order  basis.  This  has  consequences  for  the  Galerkin  formulation  used  where 
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the  residuals  on  both  the  low  and  high  order  bases  are  set  orthogonal  to  their  own 
basis.  In  the  interpolating  case  when  solving  on  the  coarse  grid,  one  attempts  to  set 
information  from  the  higher  order  basis  orthogonal  to  the  lower  order  basis.  However 
the  information  contained  in  the  higher  order  bases  is  not  contained  in  the  lower  order 
basis,  and  the  correction  calculated  on  the  coarser  grid  then  attempts  to  correct  for 
the  errors  from  the  higher  order  modes  even  though  it  cannot.  If  a  collocation  scheme 
was  used  instead,  this  interpolating  strategy  may  be  appropriate,  but  here  we  need 
to  be  more  careful.  Instead,  the  operations  should  be  as  follows: 


= 

=  £u^ 

Ui,S 

=  Viuf  =  Vi£Vjf^u^ 

< 

= 

=  ^ei 

ev 

=  Rvef  = 

(5.7) 


The  only  difference  from  the  case  of  the  interpolation  is  that  here  we  have  included  a 
£  matrix.  £  e  3?^’^  is  basically  a  cutoff  hlter,  and  for  this  particular  implementation, 

/  1  (i,j)  =  [(l,l)  (2,2)  (Af  +  2,3)| 

Sij  =  <  (5.8) 

I  0  otherwise 

With  the  £  matrix  included,  the  correct  restriction/prolongation  operator  is  dehned. 

A  more  realistic  implementation  was  also  attempted,  where  the  residual  r  =  b—Ax 
instead  is  solved  on  the  coarse  grid  e  =  {TZ^ Ai^TVjr ,  and  the  correction  is  applied  as 
X  =  e  +  X.  Some  additional  coding  was  required,  but  the  function  from  the  “MG” 
implementation  could  be  re-used.  The  new  MG  implementation  was  combined  with 
ILU(O)  preconditioned  GMRES(m).  It  was  desirable  to  use  BiGGSTAB(/)  however 
the  function  from  Sleijpen  (2009)  would  not  accept  an  initial  guess,  and  hence  could 
not  be  restarted  in  a  loop.  Hence,  GMRES(m)  was  used  instead.  The  algorithm  was 
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as  follows: 


for  i=l ,2, . . . 

[X]  =  GMRES(A,B,RESTART,T0L,MAXIT,M1,M2,X)  ;  '/oSmoothing 
Resid=B-A*X;  "/oResidual 

Resid_l=R*Resid;  7oRestriction 

E_l=inv(A_l)  *Resid_l ;  "/oCoarse  grid  solution 

E=R’*E_1;  7oProlongation 

X=X+E;  7oCorrection 


In  one  step,  the  scheme  is  a;  =  (I  —  +  Here  7^  is  a  restriction 

operator  and  its  transpose  ,7^^,  is  the  prolongation  operator;  Ai  is  the  matrix  to 
solve  the  problem  using  p  =  1  order  basis  functions,  and  A  is  the  matrix  to  solve 
the  problem  for  a  higher  order  basis  function;  x  is  the  approximate  solution  at  the 
current  iteration;  and  b  is  the  right-hand  side  vector  of  the  Ax  =  b  system. 

5.3.6  Numerical  Experiments 

A  brief  description  of  the  functions  and  scripts  written  for  this  study  can  be  found 
in  Appendix  B. 

A  number  of  numerical  benchmarks  and  tests  were  run,  and  each  is  briefly  de¬ 
scribed  below.  The  primary  benchmark  starts  with  a  wide  range  of  solver,  precon¬ 
ditioner,  and  flow  conhgurations  to  identify  promising  directions  to  investigate.  The 
benchmarks  that  follow  test  progressively  fewer  combinations,  until  only  two  solver 
choices  with  one  preconditioner  remain.  The  performance  of  these  two  solver/pre- 
conditioner  combinations  are  then  analyzed  to  identify  whether  additional  improve¬ 
ments  can  be  made  to  the  convergence  rate.  Finally,  to  see  if  there  is  any  improvement 
is  possible,  a  combined  ILU(O)  and  MG  preconditioner  is  tested. 

The  numerical  experiments  were  conducted  on  a  2.4  GHz  Intel  desktop  computer 
running  Windows  XP.  The  domain  used  has  757  triangles  and  1162  faces,  and  uses 
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4th  basis  functions  for  a  total  of  11,355  degrees  of  freedom.  The  HDG  imple¬ 
mentation  reduces  the  number  of  global  unknowns  to  5,810. 

The  primary  benchmark  was  rnn  with  both  and  HDG  and  LDG  discertizations. 
The  magnitude  of  the  velocity  varies  throughout  the  domain,  and  the  scaling  factor 
is  taken  as  one  for  cases  where  advection  is  tnrned  on,  and  zero  when  advection  is 
turned  off.  The  value  taken  for  the  diffusivity  is  taken  as  one  when  diffnsion  is  on 
and  zero  when  it  is  off. 

All  the  LDG  simulations  are  initialized  with  the  previous  timestep,  whereas  the 
HDG  simulations  are  initialized  using  a  zero  vector  for  the  input.  At  large  values 
of  timestep  size,  the  initial  guess  vector  will  not  have  a  large  impact  on  the  solntion 
time.  In  the  reported  results  the  system  is  solved  once  for  the  LDG  discretization, 
and  three  times  for  the  HDG  discretization.  For  the  rest  of  the  nnmerical  tests,  only 
the  performance  of  LDG  discretizations  were  examined. 

Primary  benchmark 

The  primary  benchmark  tests  the  performance  of  all  the  solvers  for  all  the  dif¬ 
ferent  preconditioners  for  different  flow  parameters,  a  total  of  900  combinations.  In 
snmmary,  the  following  parameters  were  varied: 

•  Discretization;  [LDG,  HDG] 

•  Solvers:  [MATLAB’s  “Slash”’  solver,  GMRES(m)  with  m  =  20,  BiGGSTAB(/) 
with  /  =  10,  QMR] 

•  Preconditioners:  All  the  preconditioners  listed  in  §5.3.2. 

•  Advection:  [No  advection,  advection] 

•  Diffusion:  [No  diffnsion.  Diffusion] 

•  dt  =  [0.2,  2,  20,  200,  2000] 

Note  that  cases  with  large  timestep  sizes  are  examined  because  they  are  important 
to  advance  solntions  toward  steady-state,  and  this  also  follows  the  work  of  Persson 
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and  Peraire  (2008).  This  benchmark  was  rnn  twice  and  the  nnmber  of  iterations  were 
consistent. 

GMRES(m)  and  BiCGSTAB(/)  restart  benchmark 

The  pnrpose  of  the  “restart”  benchmark  was  to  hnd  the  best  valne  of  m  and 
/  for  GMRES(m)  and  BiCGSTAB(/)  respectively,  with  a  total  of  585  tests.  This 
benchmark  varied  the  following  parameters: 

•  Discretization:  LDG 

•  Solvers:  [GMRES(m)  with  varions  m,  BiGGSTAB(/)  with  various  /] 

•  m,  2  X  /  =  [2, 4, 8, 10, 12, 14, 16, 18,  20, 24,  30, 40,  50] 

•  Preconditioner:  ILU(O). 

•  Advection:  [No  advection,  advection] 

•  Diffusion:  [No  diffusion.  Diffusion] 

•  dt  =  [0.2,  2,  20,  200,  2000] 

This  benchmark  was  run  twice  and  the  number  of  iterations  were  consistent. 

BiCGSTAB(/)  with  /  =  5,  9  ILU  benchmark 

Here  the  behavior  BiGGSTAB(/)with  /  =  5,  9  using  different  ILU  preconditioners 
was  examined  for  advection-diffusion  with  a  GEL  number  of  10,000.  In  this  case,  the 
computation  time  of  the  preconditioner  was  important,  and  hence  was  included  in 
the  benchmark  time.  The  following  parameters  were  varied: 

•  ILU  factorization  options 

setup. type  =  [’nohlL,  ’ilutp’] 

setup.droptol=[10°,  10-^  lO'^,  lO'^,  10-^  10-^  lO”®,  10“^  10“®,  10“^ 
10-1°] 
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Convergence  history  test 


In  the  convergence  history  test,  GMRES(m)  with  m  =  20,  BiCGSTAB(/)  with 
/  =  10,  and  QMR  are  compared  with  and  without  the  same  ILU(O)  or  p-MG  precon¬ 
ditioner  for  an  LDG  discretization.  Only  a  timestep  size  of  200  is  considered,  and  all 
three  flow  regimes  are  considered.  The  convergence  history  is  plotted  with  and  with¬ 
out  the  preconditioner  (for  both  ILU(O)  and  p-MG),  and  the  50  largest  and  smallest 
scaled  eigenvalues  of  the  ILU (0)  preconditioned  and  un-preconditioned  A  matrix  are 
plotted  on  the  complex  plane. 

MG  with  ILU(O)  smoother 

In  this  test,  a  proper  MG  scheme  is  combined  with  a  preconditioned  GMRES(m) 
smoother  ,and  the  convergence  rates  are  examined  for  all  flow  regimes  with  timestep 
size  of  2000.  For  reference,  all  simulations  are  plotted  along  with  both  the  conver¬ 
gence  rate  of  ILU(O)  preconditioned  GMRES(m)  and  naive  p-MG  preconditioned 
GMRES(m).  The  following  were  varied: 

•  GMRES(m)preconditioner:  [Naive  MG,  none,  ILU(O)] 

•  Fine  grid  basis  function  order:  [4,  2] 

5.3.7  Discussion  and  Results 

Primary  benchmark  results  and  discussion 

The  results  of  the  primary  benchmark  for  HDG  can  be  found  in  Tables  A. 2  - 
A. 4,  and  the  results  for  LDG  can  be  found  in  Tables  A. 5  -  A. 7.  Note  that  the 
benchmark  time  results  for  MATLAB’s  “slash”  operator  are  also  included.  Also 
note  that  the  “slash”  solver  is  not  preconditioned,  but  the  times  recorded  serve  to 
quantify  the  variability  of  the  benchmark  time,  as  well  as  give  a  basis  of  comparison 
for  a  fast  solver.  MATLAB’s  “slash”  operator  is  likely  using  a  spare-direct  solver  for 
this  problem.  Note  that  BiGGSTAB(/)  and  GMRES(m)  are  competitive  with  the 
“slash”  operator  at  small  timestep  sizes  (or  low  GEL  numbers).  Also  note,  in  the 
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“Summary”  column,  the  iterations  for  QMR  and  BiCGSTAB(/)  are  multiplied  by  a 
factor  of  2  because  they  require  2  matrix-vector  (MV)  multiplications  per  iteration, 
whereas  GMRES(m)  only  requires  one.  Hence,  the  “Min  MV”  column  plots  the 
minimum  matrix-vector  multiplications,  but  only  serves  to  give  an  approximate  cost 
of  the  method,  because  both  GMRES(m)  and  BiGGSTAB(/)  become  more  expensive 
for  larger  m  and  I  respectively. 

A  number  of  simulations  did  not  converge  within  the  specihed  maximum  iteration 
count.  This  will  be  addressed  once  good  solver-preconditioner  combinations  are  found. 
In  particular,  the  QMR  solver  was  the  least  robust,  failing  to  converge  for  the  largest 
number  of  cases,  whereas  BiGGSTAB(/)  appeared  to  be  the  most  robust. 

The  traditional  preconditioners  do  not  generally  perform  as  well  as  their  block 
counterparts.  While  the  traditional  preconditioners  tend  to  decrease  the  number  of 
iterations  till  convergence  over  the  unpreconditioned  case,  the  block-preconditioners 
decrease  the  number  of  iterations  by  a  larger  factor. 

As  expected,  the  two  Block  Jacobi  implementations  and  the  block  ILU  precondi¬ 
tioner  converge  in  a  consistent  number  of  iterations  with  the  same  residual,  however 
the  times  for  each  vary  by  as  much  as  a  factor  of  10  (see  dt=2000,  BiGGSTAB(/), 
Table  A. 4).  Because  the  Jacobi2  implementation  is  up  to  10  times  slower  than  the 
Jacobi  implementation  for  BiGGSTAB(/),  the  times  for  the  GS  preconditioner  can  be 
taken  to  be  on  the  order  of  ten  times  smaller  for  BiGGSTAB(/)  because  the  imple¬ 
mentations  are  similar. 

For  some  cases  the  Jacobi2  implementation  converged  in  fewer  iterations  (for 
example  see  dt=2000,  BiGGSTAB(/),  Table  A. 4),  but  these  are  exceptional  cases 
where  the  Jacobi2  implementation  converged  in  one  outer  iteration  of  BiGGSTAB(/) 
before  the  other  implementations,  and  this  may  be  explained  by  rounding  errors  in 
the  calculation  of  the  residual,  since  the  residual  is  used  as  a  stopping  criterion  at 
the  end  of  each  outer  iteration.  The  value  of  the  residual  is  larger  for  the  Jacobi2 
implementation  which  converged  in  fewer  iterations,  and  this  supports  the  proposed 
hypothesis.  However,  the  reason  why  the  Jacobi2  implementation  converged  one  outer 
iteration  sooner  is  not  completely  clear. 
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Focusing  on  the  number  of  iteration,  where  fewer  iterations  determine  better  per¬ 
formance,  the  Block  GS  preconditioner  generally  performs  better  than  the  Block 
Jacobi  preconditioners  for  all  flow  regimes  with  both  the  LDG  and  HDG  implementa¬ 
tions.  However,  the  GS  preconditioner  is  outperformed  by  the  ILU (0)  preconditioner 
in  all  cases  for  the  HDG  discretization.  For  the  LDG  implementation,  however, 
the  ILU(O)  preconditioner  sometimes  performs  better,  usually  for  smaller  time-step 
sizes  and  more  advective  flows,  while  the  GS  preconditioner  seems  better  for  larger 
timesteps.  Although,  in  the  LDG  implementation,  the  naive  implementation  of  the 
MG  preconditioner  gives  the  best  consistent  performance.  Thus,  for  the  HDG  im¬ 
plementation,  the  ILU(O)  preconditioner  seems  to  perform  best,  while  for  the  LDG 
implementation  the  naive  implementation  of  the  MG  preconditioner  performs  best. 

The  excellent  performance  of  the  naive  implementation  of  the  MG  preconditioner 
for  the  LDG  implementation  was  unexpected.  Initially  when  the  interpolating  re¬ 
striction/prolongation  operators  were  used,  poor  performance  was  observed  for  the 
MG  preconditioner.  However,  while  the  MG  preconditioner  improved  the  rate  of 
convergence  for  the  HDG  implementation  over  the  case  where  no  preconditioner  was 
used,  it  was  consistently  outperformed  by  the  ILU(O)  preconditioner.  The  mismatch 
between  the  performance  of  the  MG  preconditioner  for  the  two  different  implemen¬ 
tations  indicate  either  that  the  good  performance  for  the  LDG  implementation  is 
special,  or  that  the  restriction/prolongation  operators  are  incorrectly  specihed  for  the 
HDG  implementation.  Since  the  HDG  method  maps  information  from  the  interior 
of  an  element  to  its  edges,  the  restriction/prolongation  of  the  solution  on  the  edges 
may  not  follow  the  same  rules  as  the  method  discussed  in  Section  5.3.5  for  the  restric¬ 
tion/prolongation  of  the  solution  on  the  elements.  The  excellent  performance  for  the 
MG  preconditioner  on  the  LDG  discretization  may  be  explained  by  the  orthogonality 
of  the  Koornwinder  modal  basis  and  the  linearity  of  the  problem,  in  which  case  the 
MG  preconditioner  applied  naively  (MIA  =  M1&)  essentially  solves  the  lower  modes 
directly.  For  non-linear  problems,  the  same  result  cannot  be  expected.  Then,  the 
linearity  property  may  be  lost  for  the  HDG  implementation,  which  could  explain  why 
the  naive  MG  implementation  does  not  perform  equally  well.  In  either  case,  further 
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investigation  is  warranted,  but  is  beyond  the  scope  of  this  thesis. 

Comparing  the  performance  of  the  HDG  implementation  to  the  LDG  implemen¬ 
tation,  it  can  be  seen  the  the  HDG  implementation  converged  for  more  cases  than  the 
LDG  implementation.  Also,  keeping  in  mind  that  the  HDG  implementation  solved 
the  system  three  times  whereas  the  LDG  implementation  solves  the  system  only  once, 
the  HDG  implementation  tended  to  converge  in  fewer  iterations  and  completed  the 
calculations  faster  than  the  LDG  implementation  for  the  same  preconditioner.  How¬ 
ever,  due  to  the  excellent  performance  of  the  MG  preocnditioner,  for  some  cases  with 
timestep  sizes  larger  than  20  (GFL  larger  than  100)  the  LDG  implementation  was 
found  to  be  more  efficient  than  the  HDG  implementation.  This  means  that  either  a 
more  competitive  preconditioner  for  the  HDG  implementation  needs  to  be  found,  or 
that,  for  an  iterative  solution  method,  there  is  no  clear  winner  between  the  LDG  and 
HDG  method. 

For  small  values  of  the  timestep  size,  the  ILU(O)  preconditioned  GMRES(m)  or 
BiGGSTAB(/)  solvers  converge  acceptably  fast  for  both  discretizations,  and  only 
problems  with  large  timestep  size  still  require  a  better  preconditioner. 

This  section  identihed  the  ILU (0)  and  p-MG  preconditioners  using  the  GMRES(m) 
or  BiGGSTAB(/)  solvers  as  promising  directions  to  investigate.  Examining  the  re¬ 
sults,  it  can  also  be  seen  that  pure  diffusive  cases  are  more  difficult  to  solve,  and 
a  better  preconditioning  scheme  is  needed  to  handle  these  cases.  Additionally,  the 
HDG  discretization  converged  more  robustly  and  faster  than  the  LDG  discretization 
when  using  the  ILU (0)  preconditioner,  however  it  did  not  see  a  drastic  improvement 
in  performance  using  the  MG  preconditioner  whereas  the  LDG  discretization  did. 

GMRES(m)  and  BiCGSTAB(/)  restart  benchmark  results  and  discussion 

No  clear  winner  for  the  choice  of  solver  was  evident  from  the  primary  benchmark, 
but  both  GMRES(m)  and  BiGGSTAB(/)  performed  better  than  QMR.  Since  these 
algorithms  depend  on  m  and  I  respectively,  m  and  I  were  varied  to  see  if  an  additional 
gain  in  performance  could  be  realized  while  using  the  ILU(O)  preconditioner.  The 
results  are  reported  in  Tables  A. 8  -  A.  10. 


118 


For  pure  advection  at  small  timestep  sizes,  GMRES(m)  was  slightly  faster  than 
BiCGSTAB(/)  but  the  performance  was  comparable  (note  the  residual  is  smaller  for 
BiGGSTAB(/)  for  these  cases).  At  large  timestep  sizes,  BiGGSTAB(/)  with  I  ^  7 
performed  consistently  well,  converging  within  the  specihed  iteration  tolerance  for  all 
timestep  sizes,  whereas  GMRES(m)  failed  to  converge  within  the  specihed  iteration 
tolerance  for  the  largest  timestep  size  {dt  =  2000).  Interestingly,  BiGGSTAB(/)  with 
large  values  of  /  seems  less  robust,  since  the  cases  with  large  /  did  not  converge  for 
all  pure  advective  cases. 

For  pure  diffusion,  both  GMRES(m)  and  BiGGSTAB(/)  did  not  converge  at  large 
timestep  sizes.  The  residuals  were  similar  for  both  solvers  at  the  the  small  timestep 
sizes  although  BiGGSTAB(/)  had  one  order  of  magnitude  smaller  residuals  at  the 
largest  timestep  size,  but  the  performance  was  comparable.  At  small  timestep  sizes, 
GMRES(m)  was  slightly  faster,  but  the  performance  was  again  similar.  Overall,  the 
performance  was  poor  for  both  solvers,  and  a  better  preconditioner  is  required  for 
diffusive  regimes  when  using  a  large  timestep  size. 

The  results  for  the  advection-diffusion  case  were  similar  to  the  pure  diffusion  case, 
suggesting  the  how  regime  chosen  is  more  dihusion-dominated.  Although,  for  the 
advection-dihusion  case,  fewer  converged  solutions  resulted,  specihcally  BiGGSTAB(/) 
failing  to  converge  for  more  cases  than  the  pure  dihusive  case.  The  hnal  residual  is 
smaller  for  GMRES(m)  with  m  =  50  compared  to  BiGGSTAB(/)  for  all  cases  where 
the  timestep  size  was  smaller  than  2000.  However,  at  a  timestep  size  of  2000,  the 
smallest  residual  for  BiGGSTAB(/)  was  an  order  of  magnitude  smaller  than  the  small¬ 
est  GMRES(m)  residual.  Regardless,  a  better  preconditioner  is  required  to  handle 
the  solution  of  the  dihusive  terms. 

Overall,  for  GMRES(m)  small  values  of  m  resulted  in  faster  convergences,  whereas 
larger  values  of  GMRES(m)  seemed  to  converge  more  robustly.  For  BiGGSTAB(/) 
/  ~  7  is  a  good  choice.  The  total  performance  of  the  two  solvers  were  similar,  but 
because  BiGGSTAB(/)  was  more  robust,  it  is  the  preferred  solver. 
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Convergence  versus  Matrix-Vector  Multiplies: 


Figure  5-3:  Residual  history  of  different  solvers  with  and  without  ILU(O)  precondi¬ 
tioner  for  advection  only  flow  regime  with  timestep  size  of  200 

BiCGSTAB(Z)  with  Z  =  5,  9  ILU  benchmark 

The  ILU(O)  preconditioner  was  superior  in  computational  time  compared  to  all 
the  other  ILU  factorizations  attempted. 

Convergence  history  test  results  and  discussion 

The  convergence  histories  using  the  ILU(O)  preconditioner  are  plotted  in  Figures 
5-3  to  5-5  for  each  flow  regime,  and  the  convergence  histories  using  the  p-MG  pre¬ 
conditioner  are  plotted  in  Figures  5-6  to  5-8  for  each  flow  regime. 

The  most  notable  feature  is  the  rapid  convergence  of  p-MG  preconditioned 
GMRES(m)  and  BiGGSTAB(Z).  QMR  does  not  see  an  improvement  from  the  p-MG 
preconditioner,  and  this  may  be  explained  by  the  fact  that  the  QMR  implementation 
also  needs  supplied  to  it.  This  suggests  that  either  the  implementation  is 

incorrect,  or  that  a  naive  implementation  is  not  sufficient,  and  more  care  needs  to 
be  taken  with  the  restriction/prolongation  portion  of  this  preconditioner  when  imple¬ 
menting  the  function  performing  the  matrix-vector  multiply.  QMR  con- 
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Convergence  versus  Matrix-Vector  Multiplies: 
Diffusion  Oniy 


Figure  5-4:  Residual  history  of  different  solvers  with  and  without  ILU(O)  precondi¬ 
tioner  for  diffusion  only  flow  regime  with  timestep  size  of  200 


Convergence  versus  Matrix-Vector  Muitiplies: 
Advection-Diffusion 


Figure  5-5:  Residual  history  of  different  solvers  with  and  without  ILU(O)  precondi¬ 
tioner  for  advection-diffusion  flow  regime  with  timestep  size  of  200 
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Figure  5-6:  Residual  history  of  different  solvers  with  and  without  p-MG  precondi¬ 
tioner  for  advection  only  flow  regime  with  timestep  size  of  200 


Figure  5-7:  Residual  history  of  different  solvers  with  and  without  p-MG  precondi¬ 
tioner  for  diffusion  only  flow  regime  with  timestep  size  of  200 
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Figure  5-8:  Residual  history  of  different  solvers  with  and  without  p-MG  precondi¬ 
tioner  for  advection-diffusion  flow  regime  with  timestep  size  of  200 

verged  slowly  and  smoothly  for  most  cases  but  diverged  for  some  cases.  GMRES(m) 
tended  to  converge  smoothly  and  monotonically  as  expected,  whereas  BiGGSTAB(/) 
would  converge  erratically  but  with  a  general  downward  trend  for  most  cases. 

The  p-MG  preconditioner  improved  the  convergence  rate  considerably  for  all  flow 
regimes,  with  BiGGSTAB(/)  converging  within  one  outer  iteration  and  GMRES(m) 
converging  within  15  iterations.  While  these  results  are  favorable,  it  is  worth  still 
considering  the  ILU(O)  preconditioner,  since  the  p-MG  result  may  only  hold  for  the 
special  case  of  linear  equations,  and  a  more  general  preconditioner  capable  of  handling 
non-linear  problems  is  desired.  While  it  is  expected  that  the  p-MG  preconditioner 
would  improve  the  rate  of  convergence  for  diffusive  problems,  the  improvement  for  the 
advective  case  was  not  expected,  and  may  be  due  to  the  linearity  of  the  problem.  Also, 
the  p-MG  preconditioner  did  not  improve  the  convergence  of  the  HDG  discretization, 
hence  we  examine  the  performance  of  the  ILU(O)  preconditioner  next. 

The  ILU(O)  preconditioner  only  improved  the  convergence  rate  for  the  pure- 
advective  case  for  all  the  solvers,  although  the  improvement  was  not  as  impressive  as 
the  p-MG  preconditioner.  The  ILU(O)  preconditioner  seemed  to  improve  the  initial 
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100  Largest  and  Smallest  Eigenvalues:  Advectlon  Only 


X  Unconditioned  A 
O  Conditioned  A 
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Figure  5-9:  Eigenvalues  of  conditioned  and  unconditioned  A  matrices  for  pure  ad- 
vection  with  timestep  size  200.  The  eigenvalues  are  normalized  by  A*  =  X/Amax, 
where  A^ax  =  max3?{A}  —  min3?{A}  is  the  maximum  range  of  the  real  component 
of  the  eigenvalues  of  both  matrices 


reduction  of  the  residual  for  all  cases,  but  for  the  advection-diffusion  case,  the  rate  of 
the  convergence  for  GMRES(m)  is  clearly  slower  for  the  preconditioned  matrix.  To 
gain  insight  into  why  this  happens,  we  examine  the  eigenvalues  of  the  preconditioned 
and  unpreconditioned  matrices,  which  are  reported  in  Figures  5-9  to  5-11. 

Note  that  the  eigenvalues  in  5-9  to  5-11  are  normalized  by  the  maximum  range 
of  the  real  component  of  the  eigenvalues  of  both  matrices.  From  Trefethen  and 
Ban  (1997),  the  convergence  of  GMRES(m)  is  improved  when  the  eigenvalues  are 
localized  and  do  not  surround  the  origin.  From  Figure  5-9,  it  can  be  seen  that 
the  ILU(O)  preconditioner  seems  to  localize  the  eigenvalues  of  the  original  system 
considerably,  whereas  Figures  5-10  and  5-11  still  show  that  the  eigenvalues  have  large 
imaginary  components.  Whether  or  not  the  preconditioned  eigenvalues  surround  the 
origin  are  not  clear  from  Figures  5-9  to  5-11,  and  thus  a  different  normalization  of  the 
eigenvalues  are  plotted  in  Figures  5-12  to  5-14.  Here  the  eigenvalues  are  normalized 
by  the  maximum  absolute  value  of  the  real  part  of  the  eigenvalue  belonging  to  the 
specihc  matrix. 

From  Figure  5-12  we  see  that  for  pure  advection  the  normalized  preconditioned 
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Figure  5-10:  Eigenvalues  of  conditioned  and  unconditioned  A  matrices  for  pure  dif¬ 
fusion  with  timestep  size  200.  The  eigenvalues  are  normalized  by  =  \/Amax, 
where  A^ax  =  max3?{A}  —  min3?{A}  is  the  maximum  range  of  the  real  component 
of  the  eigenvalues  of  both  matrices 
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Figure  5-11:  Eigenvalues  of  conditioned  and  unconditioned  A  matrices  for  advection- 
diffusion  with  timestep  size  200.  The  eigenvalues  A^  are  normalized  by  As  =  X/A^ax, 
where  Amax  =  max3fJ{A}  —  min3fJ{A}  is  the  maximum  range  of  the  real  component 
of  the  eigenvalues  of  both  matrices 
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100  Largest  and  Smallest  Eigenvalues:  Advectlon  Only 


Figure  5-12:  Eigenvalues  of  conditioned  and  unconditioned  A  matrices  for  pure  advec- 
tion  with  timestep  size  200.  The  eigenvalues  A*  are  normalized  by  A*  =  \/Xmax,  where 
Xmax  =  max3?{A}  is  the  maximum  absolute  value  of  the  real  part  of  the  eigenvalue 
belonging  to  the  specihc  matrix 


100  Largest  and  Smallest  Eigenvalues:  Diffusion  Only 


O  O 


*  Unconditioned  A 
O  Conditioned  A 


-0.6  -0.4  -0.2  0  0.2  0.4  0.6  0.8 

RetLj 


Figure  5-13:  Eigenvalues  of  conditioned  and  unconditioned  A  matrices  for  pure  diffu¬ 
sion  with  timestep  size  200.  The  eigenvalues  A*  are  normalized  by  A*  =  X/Xmax,  where 
Xmax  =  max3fJ{A}  is  the  maximum  absolute  value  of  the  real  part  of  the  eigenvalue 
belonging  to  the  specihc  matrix 
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Figure  5-14:  Eigenvalues  of  conditioned  and  unconditioned  A  matrices  for  advection- 
diffusion  with  timestep  size  200.  The  eigenvalues  A*  are  normalized  by  A*  =  X/Xmax, 
where  Xmax  =  max3?{A}  is  the  maximum  absolute  value  of  the  real  part  of  the 
eigenvalue  belonging  to  the  specihc  matrix 

eigenvalues  do  not  surround  the  origin  and  are  generally  more  localized  even  though 
the  relative  imaginary  components  of  the  eigenvalues  are  larger.  In  cases  with  dif¬ 
fusion,  Figure  5-13  and  5-14  show  that  the  preconditioned  eigenvalues  surround  the 
origin  in  both  cases,  and  that  the  relative  size  of  the  imaginary  components  of  the 
preconditioned  matrix  are  larger.  More  importantly,  it  also  pushes  the  smallest  eigen¬ 
values  closer  to  zero  such  that  the  ratio  Xmax/Xmin  is  increased.  Therefore  the  ILU(O) 
preconditioner  does  not  favorably  scale  the  matrix  for  the  cases  where  diffusion  is  in¬ 
volved.  This  suggests  that  a  better  preconditioner  is  required  to  handle  the  diffusive 
part  of  the  flow. 

MG  with  preconditioned  GMRES(m)  smoother  results  and  discussion 

All  results  are  reported  in  Appendix  C.  Note  that  in  the  legend  “MG”  refers  to 
the  properly  implemented  MG  preconditioner  with  a  GMRES(m)  smoother,  where 
the  preconditioner  used  for  the  smoother  is  indicated  in  the  caption,  and  “MG  naive” 
refers  to  the  naive  MG  implementation  used  during  the  Primary  benchmark. 

In  all  cases,  the  naive  MG  preconditioner  is  considerably  better  than  any  other 
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combination.  It  is  rivaled  only  by  the  the  proper  MG  implementation  when  the 
GMRES(m)  smoother  is  preconditioned  by  the  naive  MG  preconditioner,  in  which 
case  the  naive  MG  preconditioner  is  doing  most  of  the  work.  Gonsidering  that  the 
primary  benchmark  indicated  that  the  naive  MG  preconditioner  did  not  work  for  the 
HDG  implementation,  the  cases  where  the  naive  MG  preconditioner  is  not  used  are 
also  examined. 

Gonsidering  the  case  where  the  hne-grid  disrectization  uses  fourth  order  basis 
functions,  the  convergence  is  examined.  For  pure  advection,  if  the  GMRES(m) 
smoother  is  not  preconditioned,  the  solver  diverges!  However,  the  ILU(O)  precon¬ 
ditioned  GMRES(m)  smoother  combined  with  the  proper  MG  scheme  increases  the 
rate  of  convergence  over  using  ILU(O)  preconditioned  GMRES(m)  without  the  MG 
correction.  The  pure  diffusion  case  divergences  with  the  proper  MG  preconditioner  us¬ 
ing  either  unpreconditioned  or  ILU(O)  preoconditioned  GMRES(m).  The  advection- 
diffusion  case  diverges  when  GMRES(m)  is  preconditioned  with  ILU(0)When  GMRES(m) 
is  not  preconditioned  and  only  uses  the  proper  MG  scheme,  it  converges  faster  than 
if  GMRES(m)  uses  only  the  ILU(O)  preconditioner  (that  is  without  the  MG  correc¬ 
tion).  These  disheartening  results  indicate  that  a  single  solution  scheme  does  not 
suffice,  and  a  good  preconditioner  for  the  pure  diffusive  case  has  not  been  found.  In 
all  cases,  the  residual  is  increased  during  the  MG  correction,  even  though  the  overall 
rate  of  convergence  is  increased.  The  increase  of  the  residual  after  the  MG  correction 
is  suspected  to  be  due  to  the  large  jump  in  grid  sizes  between  the  p  =  4  and  p  =  1 
disrectizations.  Hence,  the  MG  scheme  is  also  examined  for  a  hne-grid  discretization 
using  p  =  2  order  basis  functions. 

Gonsidering  the  case  where  the  hne-grid  discretization  uses  second  order  basis 
functions,  and  still  ignoring  cases  where  the  naive  MG  preconditioner  is  used,  the  con¬ 
vergence  is  examined.  In  this  case,  the  ILU(O)  preconditioned  GMRES(m)  smoother 
combined  with  the  proper  MG  preconditioner  gives  better  results  than  using  only 
ILU(O)  preconditioned  GMRES(m).  The  pure  advection  case  converges  within  600 
matrix-vector  multiplies,  and  the  advection-dihusion  case  converges  within  100.  For 
the  pure  dihusion  case,  the  simulation  does  not  converge  within  1000  matrix-vector 
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multiplies,  but  the  final  residual  is  nearly  two  orders  of  magnitude  lower  than  when 
using  only  the  ILU(O)  preconditioned  GMRES(m)  solver,  and  the  rate  of  convergence 
is  faster.  The  advection-diffusion  case  did  not  see  an  increase  in  the  residual  after 
the  MG  correction,  whereas  the  pure  advection  and  pure  diffusion  cases  still  saw  the 
increase.  This  result  shows  that  when  the  difference  between  the  order  of  the  basis 
used  on  the  coarse  and  fine  grid  discretizations  are  not  as  large,  improved  convergence 
can  be  realized  using  the  proper  MG  implementation.  This  suggests  that  a  hierarchal 
p-MG  scheme  (V  or  W  MG  for  example)  could  be  the  best  choice. 


5.4  Conclusions  and  Recommendations 

The  best  preconditioner  for  the  LDG  discretization  is  the  naive  p-MG  precondi¬ 
tioner  and  for  the  HDG  implementation  the  best  preconditioner  found  was  the  ILU(O) 
preconditioner,  although  proper  p-MG  schemes  were  not  examined  for  HDG.  For  both 
discretizations,  the  BiGGSTAB(/)  solver  seemed  to  be  the  most  robust,  and  gave  the 
best  consistent  performance. 

It  was  found  that  the  HDG  discretized  matrices  were  faster  to  solve  with  fewer 
iterations  in  cases  where  the  naive  p-MG  preconditioner  was  not  used  for  the  LDG 
discretization.  With  the  naive  p-MG  preconditioner,  the  LDG  discretized  matrices 
can  be  solved  more  efficiency  for  timestep  sizes  larger  than  20  (GFL  approximately 
100)  than  the  HDG  discretized  matrices. 

Values  of  /  ~  7  seemed  to  work  best  for  BiGGSTAB(/)  whereas  for  GMRES(m) 
smaller  values  of  m  resulted  in  faster  convergence,  and  larger  values  of  m  resulted  in 
more  robust  performance. 

For  properly  implemented  MG  schemes,  this  works  suggests  a  hierarchal  p-MG 
scheme,  where  the  change  in  p  between  levels  is  not  too  large,  may  improve  the  rate 
of  convergence  when  using  an  ILU(O)  preconditioned  GMRES(m)  smoother. 

It  is  recommended  that  a  naive  MG  preconditioner  is  examined  in  detail  in  order 
to,  extend  it  for  use  with  the  HDG  discretization  and  general  non-linear  problems. 
The  benchmarks  for  LDG  should  also  be  repeated  for  an  HDG  discretization  in  order 
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to  find  a  good  preconditioner  for  HDG.  Further  examination  of  the  proper  p-MG 
implementation  for  HDG  is  also  warranted  if  the  naive  p-MG  implementation  is 
found  not  to  work.  A  hierarchal  proper  p-MG  scheme  might  be  necessary. 

The  work  presented  here  enables  the  efficient  implicit  solution  of  advection-diffusion 
problems  for  solving  biogeochemical  reactions  in  the  ocean.  Also,  equations  such  as 
the  Incompressible  Navier  Stokes  equations  can  now  be  solved  efficiently  with  the 
procedure  described. 
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Chapter  6 


Conclusions 


The  purpose  of  this  thesis  is  to  identify  promising  numerical  methods  that  are 
suitable  to  multiscale  ocean  predictions.  In  order  to  fulhll  this  purpose,  current  ef¬ 
forts  towards  creating  new  ocean  models  are  reviewed,  an  understanding  of  the  most 
promising  methods  used  by  other  researchers  is  developed,  the  most  promising  exist¬ 
ing  methods  are  studied  and  applied  to  idealized  cases,  new  methods  are  incubated 
and  evaluated  by  solving  biogeochemical  advection-diffusion-reactions  equations,  and 
efficient  solver/preconditioner  combinations  for  inverting  DG  FEM  matrices  are  iden- 
tihed. 

From  our  quantitative  incubation  of  numerical  schemes,  a  number  of  recommen¬ 
dations  on  the  tools  necessary  to  solve  dynamical  equations  for  multiscale  ocean 
predictions  are  provided,  and  a  summary  follows. 

6.1  Summary  of  Results 

Most  of  the  second  generation  ocean  models  reviewed  use  some  form  of  the  FEM. 
The  FEM  models  are  more  sophisticated  than  their  FV  counterparts,  but  due  to  the 
added  complexity  are  less  mature.  The  sophistication  of  the  FEM  models  arise  from 
the  freedom  for  higher  order  schemes,  and  the  freedom  to  choose  the  space  of  the 
solution  and  test  functions.  The  FV  method  models  reviewed  are  more  mature  and 
ready  to  be  used  for  realistic  problems. 
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The  DG  FEM  is  a  promising  numerical  method  for  developing  the  next  generation 
ocean  models  if  the  efficiency  constraints  can  be  overcome.  The  DG  FEM  method 
offers  efficient  data  structures  for  parallel  implementations,  higher  order  accuracy,  ge¬ 
ometric  flexibility  enabling  sophisticated  adaptive  algorithms,  and  superconvergence 
properties  for  dispersion  and  dissipation,  making  the  method  particularly  well  suited 
to  advection-dominated  flows. 

The  DG  FEM  method  was  implemented  for  ocean  biogeochemical  reaction  equa¬ 
tions,  which  to  our  knowledge,  is  the  first  time  this  has  been  done.  The  numerical 
implementation  was  verihed  using  a  number  of  test  cases,  and  the  LSRK  time  integra¬ 
tion  scheme  was  found  to  be  more  accurate  than  the  first  order  Euler  time  integration 
scheme. 

A  purely  advective  test  case  (the  advection  of  a  cosine  bell)  was  used  to  demon¬ 
strate  that  a  higher  order  scheme  can  be  more  accurate,  more  efficient,  and  use  fewer 
degrees  of  freedom  than  a  lower  order  scheme.  It  was  concluded  that  high  and  low 
order  schemes  should  also  be  compared  on  a  efficiency-accuracy  basis  to  complement 
the  DOF-efficiency  based  comparison. 

It  shown  that  such  a  p-adaptive  scheme  using  different  orders  of  basis  functions  for 
different  constituents  on  the  same  element  is  promising  for  improving  the  efficiency 
and  the  accuracy  of  the  solution.  However,  it  was  also  shown  that  such  schemes  need 
to  consider  the  cost  of  additional  volume  and  edge  interpolation  operations  when 
formulating  the  adaptation  criterion. 

It  was  argued  that  adaptive  algorithms  are  necessary  to  resolve  important  small 
scale  features  which  would  go  unnoticed  if  a  coarse  non-adaptive  scheme  was  used. 

While  explicit  time  integration  schemes  were  sufficient  for  the  advective  opera¬ 
tors,  the  stability  constraints  associated  with  the  diffusive  terms  were  prohibitively 
expensive.  It  was  concluded  that  implicit  time  integration  schemes  were  necessary 
when  small  values  of  grid  Peclet  number  (large  values  of  k)  is  required. 

For  an  LDG  discretization,  it  was  found  that  a  naive  p-  MG  preconditioner  was 
optimum,  whereas  the  ILU(O)  preconditioner  was  the  best  preconditioner  found  for 
an  HDG  discretization.  For  both  discretizations,  the  BiGGSTAB(/)  solver  with  /  7 
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offered  consistently  efficient  and  robust  performance,  being  slightly  better  than  a 
GMRES(m)  solver,  and  much  better  than  a  QMR  solver.  HDG  discretized  systems 
were  found  to  be  faster  to  solve  with  fewer  iterations  than  LDG  discretized  systems 
whenever  the  naive  LDG  p-MG  preconditioner  was  not  used.  The  naively  p-MG 
preconditioned  LDG  solves  were  sometimes  faster  than  the  HDG  solves  for  timestep 
sizes  larger  than  approximately  20  (or  GFL  number  approximately  100).  It  was  argued 
that  a  hierarchal  p-MG  scheme  may  improve  the  rate  of  convergence  when  using  an 
ILU(O)  preconditioned  GMRES(m)  smoother. 


6.2  Recommendations 

It  is  recommended  that  a  mature  FV  model  such  as  SUNTANS,  FVGOM,  or  a 
mature  FEM  models  such  as  SELFE,  ADGIRG,  or  FEOM  is  used  if  unstructured  grids 
are  necessary  for  immediate  application.  Applications  that  require  unstructured  grids 
would  normally  be  found  in  regions  with  complex  geometries  or  bottom  topography. 

Adopting  a  sophisticated  adaptive  model  such  as  SLIM  or  IGOM  for  near  future 
use  is  recommended.  The  flexibility  and  accuracy  of  these  adaptive  models  promise 
to  widen  the  range  of  ocean  processes  that  can  be  studied,  specihcally  processes  that 
depend  on  multiple  scales. 

Additional  examination  of  DG  FEMs  for  ocean  simulations  is  recommended  due  to 
the  advantages  of  these  methods.  In  particular,  HDG  methods  seem  to  be  a  promising 
avenue  to  explore.  While  the  efficiency  of  DG  methods  may  not  be  as  good  as  compact 
ED  schemes,  the  additional  geometric  flexibility  of  DG  methods  warrant  additional 
attention.  It  is  recommended  that  DG  should  be  studied  on  adaptive  structured  grids 
for  ocean  models  and  their  efficiency  compared  to  traditional  structured  grid  schemes. 
This  approach  also  enables  the  accurate  treatment  of  complex  geometries  by  having 
an  unstructured  grid  at  boundaries  for  accuracy,  while  maintaining  the  an  efficient 
structured  mesh  for  the  bulk  of  the  unknown  in  the  interior. 

It  is  recommended  that  high-order  quadrature- free  adaptive  algorithms  are  used 
whenever  possible  for  optimum  accuracy  and  efficiency.  Also,  explicit  time  integration 
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is  recommended  for  advective  operators,  whereas  implicit  time  integration  schemes 
are  recommended  for  diffusive  operators  due  to  prohibitively  expensive  numerical 
stability  constraints  for  small  grid  Peclet  numbers. 

A  naive  MG  preconditioner  is  examined  in  detail  in  order  to  extend  it  for  use  with 
the  HDG  discretization  and  general  non-linear  problems.  Further  examination  of  the 
proper  p-MG  implementation  for  HDG  and  non-linear  problems  is  also  warranted  if 
the  naive  p-MG  implementation  performs  poorly.  A  hierarchal  proper  p-MG  scheme 
may  be  necessary. 

6.3  Future  work 

A  next  step  is  to  be  able  to  solve  both  the  physics  and  biology  using  HDG,  so  as  to 
explore  biogeochamical  ocean  processes.  This  could  allow  two-dimensional  idealized 
studies  of  coupled  physics-biology,  possibly  aiding  parameter  selection  for  realistic 
three-dimensional  simulations.  Additionally,  this  code  would  serve  as  a  test-bed  for 
adaptive  algorithms. 

The  HDG  method  will  also  be  explored.  Using  HDG  discretized  projection  meth¬ 
ods  may  yield  an  efficient  solution  method  for  solving  the  Incompressible  Navier 
Stokes  equations. 

The  solution  of  two-dimensional  physics  could  be  extended  to  three  dimensions. 
Adaptive  oct-tree  algorithms  can  be  examined  for  their  efficiency  and  compared  to 
more  standard  unstructured  adaptive  algorithms. 
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Table  A.l:  Detailed  table  of  Second  generation  ocean  models 
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Table  A. 2:  Primary  Preconditioner/Solver  benchmark  results  for  k  =  1,  KcaZe  =  0  us¬ 
ing  HDG  discretization.  Red  highlighting  indicates  the  solution  did  not  converge.  The 
fastest  simulation  for  a  given  CFL  number  is  highlighted  in  green,  and  the  iteration 
with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


Slash 

GmresR 

Qmr 

BiCgstab 

1  Summary  I 
Mn  Min  \ 

Time 

Precond. 

Time  Iter. 

Resid. 

Precond. 

Time 

her. 

Resid. 

Precond. 

Time  Iter. 

Resid. 

\  Time 

MV  \ 

dt=0.2 

1  dt=0.2  1 

1  dt=0.2  1 

1  dt=0.2  1 

1  dt=0.2  1 

0.4063 

None 

0,5156 

182 

8.7E-07 

None 

0.4063 

118 

8.2E-07 

None 

0,3281 

90 

4.0E-09 

0,3281 

180 

0.4375 

Upper 

0.2188 

105 

8.8E-07 

Upper 

0.4844 

58 

3.3E-07 

Upper 

0.2344 

30 

3.0E-08 

0.2188 

60 

0.4375 

Lower 

0,2031 

105 

8.4E-07 

Lower 

0.5000 

58 

3.0E-07 

Lower 

0,2656 

30 

2.9E-08 

0,2031 

1  60 

0.4375 

Jacobi 

0,2656 

129 

8.9E-07 

Jacobi 

0.3281 

75 

8.8E-07 

Jacobi 

0,2656 

60 

4.1E-10 

0,2656 

120 

0.4219 

ILUO 

1  0.1406 

87 

8.6E-07 

ILUO 

1  0.3594 

30 

4.6E-07 

ILUO 

[  0.3125 

30 

9.0E-12 

1  60 

0.4375 

BlockILU 

0.3125 

119 

6.0E-07 

BlockILU 

0.4375 

66 

6.4E-07 

BiockILU 

0.2969 

50 

8.0E-07 

0.2969 

100 

0.4375 

BlockJacobi 

0,3750 

119 

6.0E-07 

BlockJacobi 

0.6563 

66 

6.4E-07 

BlockJacobi 

0,5156 

50 

8.0E-07 

0,3750 

100 

0.4375 

BlockJacobi2 

0.6563 

119 

6.0E-07 

BlockJacobi2 

1.0781 

66 

6.4E-07 

BlockJacobi2 

3.0781 

50 

8.0E-07 

0.6563 

100 

0.4375 

GS 

0,2969 

100 

9.7E-07 

GS 

0.5 

48 

7.  IE-07 

GS 

3.71875 

30 

2.4E-08 

0,2969 

1  60 

0.4063 

Multigrid 

2,7500 

89 

1.7E-07 

Multigrid 

Multiarid  1 

1  87,1563 

30 

1.8E-12 

2,7500 

60 

dt=2 

1  1 

1  dt=2  1 

1  dt2  1 

1  d«=2  1 

0.4219 

None 

0,8750 

229 

8.4E-07 

None 

0.5938 

160 

9.8E-07 

None 

0,4688 

100 

5.5E-07 

0,4688 

200 

0.4375 

Upper 

0,3906 

135 

6.6E-07 

Upper 

0.7656 

87 

9.3E-07 

Upper 

0,5000 

60 

7.6E-10 

0,3906 

120 

0.4063 

Lower 

0.4063 

135 

7.5E-07 

Lower 

0.7344 

89 

9.9E-07 

Lower 

0.4531 

60 

7.3E-10 

0.4063 

120 

0.4063 

Jacobi 

0.4688 

177 

7.8E-07 

Jacobi 

0.5938 

125 

7.5E-07 

Jacobi 

0.3281 

60 

8.3E-07 

0.3281 

120 

0.4375 

ILUO  _ 

0,2813 

109 

9.4E-07 

ILUO 

0.5625 

51 

6.8E-07 

teuo  1 

0,3438 

30 

7.5E-08 

o.^s’iol 

60 

0.4219 

BlockILU 

0.4688 

162 

8.5E-07 

BlockILU 

0.7031 

110 

7.9E-07 

BlockILU 

0.4219 

60 

1.6E-07 

0.4219 

120 

0.4063 

BlockJacobi 

0,7344 

162 

8.5E-07 

BlockJacobi 

0.9844 

110 

7.8E-07 

BlockJacobi 

0,5938 

60 

1.6E-07 

0,5938 

120 

0.4063 

BlockJacobi2 

1,1406 

162 

8.5E-07 

BlockJacobi2 

1 .8438 

110 

7.8E-07 

BlockJacobi2 

3,6406 

60 

1.6E-07 

1,1406 

120 

0.4375 

GS 

0.5781 

132 

8.9E-07 

GS 

0.85938 

84 

7.8E-07 

GS 

7.09375 

60 

3.4E-10 

0.5781 

120 

0.4219 

Multigrid 

6,8906 

136 

9.3E-07 

Multiarid  I 

Multigrid 

166,01 56 

60 

9.6E-10 

6,8906 

120 

dt=20 

0.3906 

0.4063 

0.4063 

0.4375 

0.4063  ij 

0.4063 

0.4063 

0.4063 

0.3906 

0.4063 


None 

Upper 

Lower 

Jacobi 


LUO 


BlockILU 

BlockJacobi 

BlockJacobi2 

GS 

Multigrid 


dt=20 

1 .3594  403 
0.8281  213 

0,8906  213 
1.0625  322 
0,4844  147 
1,0938  257 
1 .3750  257 
2,0938  257 
1.0625  194 

20.3281  294 


^  None 
'  Upper 
^  Lower 
'  Jacobi 
'  ILUO 
BlockILU 
'  BlockJacobi 
'  BlockJacobi2 
^  GS 

’’  Multigrid 


dt=200 

0.3906 

0.4063 

0.4063 

0.4375 

0.4063 

0.4063 

0.3906 

0.4375 

0.4219 

0.3906 


1.4E+00  146 


None 

0.7656 

180 

5.8E-07 

0.7656 

360 

Upper 

0.5938 

90 

1.2E-07 

0.5938 

180 

Lower 

0,5625 

90 

2.2E-07 

0,5625 

180 

Jacobi 

0.7969 

150 

8.3E-08 

0.7969 

300 

ILUO  1 

0,51 56 

60 

4.7E-09 

'  0.48U[ 

120 

BlockILU 

0,8125 

120 

3.7E-08 

0,8125 

240 

BlockJacobi 

1.1250 

120 

3.7E-08 

1.1250 

240 

BlockJacobi2 

7,0938 

120 

3.8E-08 

2,0938 

240 

GS 

10.3125 

90 

6.6E-09 

1.0625 

180 

Multigrid 

349.4688 

130 

9.6E-07 

20.3281 

260 

0.4844 

120 

J  dt=200  1 

1  dt=200  1 

None 

2,0469 

520 

6.1E-07 

2,0469 

1040 

Upper 

1.3438 

240 

2.1E-07 

1 .3438 

480 

Lower 

1,4063 

240 

2.0E-07 

1,4063 

480 

Jacobi 

1,6563 

410 

4.7E-07 

1,6563 

820 

iuJo  1 

0.9063 

120 

6.9E-07 

0.9063[ 

240 

BlockILU 

1,5625 

310 

4.7E-07 

1,5625 

620 

BlockJacobi 

2.3750 

310 

4.7E-07 

2.3750 

620 

BlockJacobi2 

17.7656 

310 

5.0E-07 

8.2500 

620 

GS 

23,6563 

210 

7.5E-08 

3,2188 

420 

Multigrid 

1087,8594 

410 

7.5E-07 

###### 

820 

0.9063 

240 

J  dt=2000  1 

1  dt=2000  1 

None 

4,0938 

1040 

8.3E-07 

4,0938 

2080 

Upper 

2,6875 

470 

5.2E-07 

2,6875 

940 

Lower 

2.4844 

430 

2.0E-07 

2.4844 

860 

Jacobi 

4,3438 

870 

6.6E-07 

4,3438 

1740 

ILUO  1 

1.4844 

230 

1.4E-07 

1 .48441 

460 

BlockILU 

3.3438 

580 

3.9E-07 

3.3438 

1160 

BlockJacobi 

5,2813 

580 

3.9E-07 

5,2813 

1160 

BlockJacobi2 

34,2500 

600 

3.1E-07 

34,2500 

1200 

GS 

44.7969 

400 

3.2E-07 

17.0625 

800 

Multigrid 

3439,9063 

1320 

5.4E-05 

###### 

2640 
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Table  A. 3:  Primary  Preconditioner/Solver  benchmark  results  for  k  =  0,  Vgcaie  =  1  us¬ 
ing  HDG  discretization.  Red  highlighting  indicates  the  solution  did  not  converge.  The 
fastest  simulation  for  a  given  CFL  number  is  highlighted  in  green,  and  the  iteration 
with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


Slash 

Time 

Precond. 

GmresR 

Time  her.  Resid. 

Precond. 

Qmr 

Time 

her.  Resid. 

Precond. 

BiCgstab 

Time  Iter.  Resid. 

Summary 
Min  Min 

Time  MV 

dt=0.2 

dt=0.2 

dt=0.2 

dt=0.2 

dt=0.2 

0.4375 

None 

0,5781 

212 

9.9E-07 

None 

None 

0,5938 

90 

1,0E-07 

0,5781 

180 

0.4219 

Upper 

0.2031 

96 

9.6E-07 

Upper 

0.3750 

38 

9.7E-07 

Upper 

0.2500 

30 

5.1E-11 

0.2031 

60 

0.4375 

Lower 

0,1250 

90 

3.4E-07 

Lower 

0.2813 

31 

5,  IE-07 

Lower 

0,2188 

30 

2,1E-13 

0,1250 

60 

0.4375 

Jacobi 

0,1719 

105 

8.1E-07 

Jacobi 

0.2500 

49 

4,8E-07 

Jacobi 

0,1875 

30 

5,5E-09 

0,1719 

60 

0.4219 

ILUO 

0.0938 

78 

1.7E-07 

ILUO 

]  0.2188 

18 

1.  IE-07 

ILUO 

0.2500 

30 

6.0E-16 

0,0938 

36 

0.4375 

BlockILU 

0,1406 

98 

7.5E-07 

BlockILU 

0.2813 

40 

8,5E-07 

BlockILU 

0,1719 

30 

2,5E-10 

0,1406 

60 

0.4375 

BlockJacobi 

0.2188 

98 

7.5E-07 

BlockJacobi 

0.4063 

41 

3.8E-07 

BlockJacobi 

0.2969 

30 

2.5E-10 

0.2188 

60 

0.4063 

BlockJacobl2 

0.4063 

98 

7.5E-07 

BlockJacobi2 

0.7344 

41 

3.8E-07 

BlockJacobi2 

1.8906 

30 

6.6E-10 

0.4063 

60 

0.4219 

GS 

0,2656 

86 

2.0E-07 

GS 

0,32813 

27 

4,2E-07 

GS 

3,734375 

30 

6,3E-15 

0,2656 

54 

0.4219 

Multigrid 

2.9844 

94 

9.8E-07 

Multigrid 

Multigrid 

85.8281 

30 

6.6E-11 

2.9844 

60 

dt=2 

1  <*=2  1 

1  _ 1 

1  <*=2  1 

1  <*=2  1 

0.4219 

None 

2,0938 

644 

1.0E-06 

None  1 

None 

1,3750 

280 

2,7E-07 

1,3750 

560 

0.4219 

Upper 

0,6875 

197 

8.8E-07 

Upper 

1,1250 

134 

4,9E-07 

Upper 

0.5938 

80 

6.8E-07 

0.5938 

160 

0.4219 

Lower 

0,6094 

170 

7.2E-07 

Lower 

0,9063 

111 

6,4E-07 

Lower 

0,4844 

70 

5,6E-07 

0,4844 

140 

0.4375 

Jacobi 

0,7813 

254 

9.0E-07 

Jacobi  1 

Jacobi 

0,5469 

110 

5,9E-07 

0,5469 

220 

0.4375 

p'o  ■■ 

0,2500 

99 

3.2E-07 

ILUO 

0.4531 

41 

3,1E-07, 

ILUO  1 

0.3125 

30 

2.4E-12 

0.2500^ 

60 

0.4375 

BlockILU 

0,9844 

180 

7.4E-07 

BlockILU 

1,3281 

125 

8,8E-07 

BlockILU 

0,8281 

80 

6,0E-08 

6,8281 

160 

0.4375 

BlockJacobi 

0,8125 

180 

7.4E-07 

BlockJacobi 

1.2031 

125 

9,  IE-07 

BlockJacobi 

0.8281 

80 

6.0E-08 

0.8125 

160 

0.4688 

BlockJacobi2 

1,2656 

180 

7.4E-07 

BlockJacobi2 

2.0781 

125 

9,  IE-07 

BlockJacobl2 

4.8438 

80 

6.0E-08 

1.2656 

160 

0.4219 

GS 

0,4688 

124 

6.8E-07 

GS 

0,70313 

67 

8,6E-07 

GS 

4,859375 

40 

6,2E-07 

0,4688 

80 

0.4219 

Multigrid 

8,0000 

153 

8.7E-07 

Multigrid  | 

Multigrid 

165.6719 

60 

1.5E-08 

8.0000 

120 

dt=20 

0.4219 

0.4375 

0.4219 

0.4375 

0.4219 

0.4375 

0.4375 

0.4688 

0.4375 

0.4219 

dt=200 

0.4219 

0.4219 

0.4375 

0.4688 

0.4219 

0.4375 

0.4375 

0.4375 

0.4375 

0.4219 


dt=2000 

0.4219 

0.4375 

0.4375 

0.4375 

0.4375 

0.4375 

0.4375 

0.4375 

0.4219 

0.4375 


dt=20 

None 

Upper 

6,5469 

1356 

9.8E-07 

Lower 

5,6563 

1160 

9.8E-07 

Jacobi 

8,1875 

2065 

9.9E-07 

ILUO 

1,4531 

305 

9.5E-07 

BlockILU 

11,1094 

1522 

9.7E-07 

BlockJacobi 

9,2813 

1522 

9.7E-07 

BlockJacobi2 

15,0938 

1522 

9.7E-07 

GS 

4,0625 

691 

9.9E-07 

Multigrid 

50,3125 

644 

9.6E-07 

None 
Upper 
Lower 
Jacobi 
ILUO 
BlockILU 
BlockJacobi 
BlockJacobi2 
GS 

Multigrid 


None 

Upper 

Lower 

Jacobi 

ILUO 

BlockILU 

BlockJacobi 

BlockJacobi2 

GS 

Multigrid 


iLUO  I 

iBIocklLU 

I  BlockJacobi 
lBlockJacobi2 

Igs 

I  Multigrid 


ctt=20 
7.2813 
3.4219 
2.6406 
3.9531 
0.9375 
5.9688 
5.2344 
32.71 88 
32.7344 
772.3438 


1800 

560 

460 

820 

130 

580 

580 

570 

290 

290 


5.8E-06 

6.5E-07 

4.9E-07 

2.4E-07 

4.9E-08 

7.0E-07 

7.0E-07 

6.5E-07 

7.3E-07 

1.7E-07 


ctt=20 
7.2813  3600 
3.4219  1120 
2.6406  920 
3.9531  1640 
0.9375[j60 
'  5.9688  1160 
5.2344  1160 
15.0938  1140 
4.0625  580 
50.3125  580 


1  dt=200 

dt=200 

None  j 

8.0781 

1800 

1.5E-02 

1  8.07811  3600 

Upper 

10,9531 

1800 

3,8E+01 

10,9531 

3600 

Lower 

10.5469 

1800 

2,5E-hO0 

10.5469 

3600 

Jacobi 

8,5156 

1800 

2,5E+01 

8,5156 

3600 

ILUO 

1  10.1406 

1570 

1.5E-06 

10.1406 

[3140 

BlockILU 

19.3594 

1800 

5,OE-hOO 

19.3594 

3600 

BlockJacobi 

16,8750 

1800 

5,OE+00 

16,8750 

3600 

BlockJacobl2 

102.7969 

1800 

4,8E+00 

###### 

3600 

GS 

###### 

3600 

Multigrid 

4739,0469 

1800 

5,4E-06 

###### 

3600 

1  dt=2000  1 

dt=2000 l 

None 

52,4844 

1800 

52,4844 

3600 

Upper 

10,2344 

1800 

1,3E+02 

10,2344 

3600 

Lower 

9.8281 

1800 

2,2E-h01 

9.8281 

3600 

Jacobi  i 

7,3750 

1800 

4,9E+03 

7.3750J 

3600 

ILUO 

11.6719 

1800 

4.6E-02 

1  r671 9 

3600 

BlockILU 

117.9375 

1800 

###### 

3600 

BlockJacobi 

106,6719 

1800 

###### 

3600 

BiockJacobi2 

190.0469 

1800 

###### 

3600 

QS 

3,9E+03 

###### 

3600 

Multigrid 

4672.2500 

1800 

3,2E+00 

###### 

3600 
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Table  A. 4:  Primary  Preconditioner/Solver  benchmark  results  for  k  =  1,  Vgcaie  =  1  us¬ 
ing  HDG  discretization.  Red  highlighting  indicates  the  solution  did  not  converge.  The 
fastest  simulation  for  a  given  CFL  number  is  highlighted  in  green,  and  the  iteration 
with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


Time 


Qmr 


Iter.  Resid. 

Precond. 

BiCgstab 

Time  Iter.  Resid. 

Summary 
Min  Min 

Time  MV 

dt=0.2 

dt=0.2 

142 

9.2E-07 

None 

0.4531 

90 

5.0E-09 

0.4531 

180 

58 

6.  IE-07 

Upper 

0.2813 

30 

4.6E-08 

0.2656 

60 

55 

6.4E-07 

Lower 

0.2500 

30 

2.4E-08 

0.2500 

60 

86 

6.6E-07 

Jacobi 

0.3750 

60 

9.3E-10 

0.3438 

120 

31 

8.  IE-07 

ILUO 

0.3438 

30 

1.8E-12 

0.1563 

60 

73 

7.4E-07 

BlockILU 

0.2500 

30 

7.2E-07 

0.2500 

60 

73 

8.0E-07 

BlockJacobi 

0.3594 

30 

7.2E-07 

0.3594 

60 

73 

8.0E-07 

BlockJacobi2 

2.0000 

30 

7.2E-07 

0.6563 

60 

49 

4.6E-07 

GS 

3.75 

30 

3.2E-09 

0.3750 

60 

86.7969 

30 

3.1E-10 

3.2656 

60 

dt=0.2 

0.4219 

0.4063 

0.4375 

0.4375 

0.4375 

0.4375 

0.4375 

0.4375 

0.4219 

0.4219 


None 

Upper 

Lower 

Jacobi 


AUO  ! 

BlockILU 

BlockJacobi 

BlockJacobi2 

GS 

Multigrid 


dt=0.2 

0.6563 

0.2656 

0.2656 

0.3438 

0.1563 

0.2969 

0.4063 

0.6563 

0.3750 

3.2656 


9,0E-07 

8,2E-07 

7.4E-07 

9.7E-07 

9,5E-07 

9.7E-07 

9,7E-07 

9,7E-07 

8.9E-07 

5,4E-07 


None 

Upper 

Lower 

Jacobi 

ILUO 

BlockILU 

BlockJacobi 

BlockJacobi2 

GS 

Multigrid 


dt=0.2 
0,5000 
0,5156 
0,4844 
0,3750 
0,3750 
0,5000 
0,7344 
1 ,2344 
0.54688 


dt=2 

dt=2 

dt=2 

ctt=2 

dt=2 

0.4219 

None 

0.8281 

269 

8,3E-07 

None 

None 

0.5625 

120 

1.2E-07 

0.5625 

240 

0.4375 

Upper 

0.4531 

142 

6,9E-07 

Upper 

0,8438 

99 

4.9E-07 

Upper 

0.4531 

60 

2.3E-09 

0.4531 

120 

0.4375 

Lower 

0,4219 

132 

7,5E-07 

Lower 

0,7344 

87 

6,8E-07 

Lower 

0,3906 

60 

2,9E-10 

0,3906 

120 

0.4375 

Jacobi 

0.5938 

197 

9,7E-07 

Jacobi 

0,7031 

160 

8.0E-07 

Jacobi 

0.5000 

90 

9.0E-09 

0.5000 

180 

0,4375 

pjo  ! 

0,2969 

108 

9,8E-07 

ILUO 

0,6563 

56 

5,0E-07 

ILUO  1 

0,3438 

30 

5,2E-07 

0,2969^ 

60 

0,4375 

BlockILU 

0,6094 

171 

8,8E-07 

BlockILU 

0,9844 

146 

1,0E-06 

BlockILU 

0,4531 

70 

3,  IE-07 

0,4531 

140 

0.4375 

BlockJacobi 

0.7500 

171 

8,8E-07 

BlockJacobi 

1,3750 

146 

9.9E-07 

BlockJacobi 

0.6719 

70 

3.  IE-07 

0.6719 

140 

0,4375 

BlockJacobi2 

1,2188 

171 

8,8E-07 

BlockJacobi2 

2,3281 

146 

9,9E-07 

BlockJacobi2 

4,2656 

70 

3,  IE-07 

1,2188 

140 

0,4219 

GS 

0,5781 

127 

7,8E-07 

GS 

0.82813 

84 

9,6E-07 

GS 

7,078125 

60 

5,1E-11 

0,5781 

120 

0.4219 

Muitigrid 

8.4063 

157 

7,5E-07 

Muitigrid 

Multigrid 

166.0000 

60 

7.9E-08 

8.4063 

120 

dt=20 

0.3906 

0.4375 

0.4375 

0.4063 

0.4375 

0.4063 

0.4375 

0.4063 

0.4219 

0.4063 


dt=200 

0.3906 

0.4063 

0.4219 

0.4063 

0.4063 

0.4063 

0.3906 

0.4063 

0.3906 

0.3906 

dt=2000 

0.4063 

0.4063 

0.4063 

0.4063 

0.4219 

0.4063 

0.4375 

0.4063 

0.3906 

0.3906 


dt=20 

dt=20 

dt=20 

ctt=20 

None 

3,0469 

832 

9,8E-07 

^^^^HNnne 

1,8125 

380 

6,2E-07 

1,8125 

760 

Upper 

1.9531 

430 

9,6E-07 

UoDer 

^^^^lUDoer 

1.2500 

190 

2.6E-07 

1.2500 

380 

Lower 

1,4688 

340 

9,8E-07 

0,9375 

160 

1,6E-07 

0,9375 

320 

Jacobi 

2,8438 

733 

1,0E-06 

^^^^Hjacnbi 

1,6406 

330 

1,8E-07 

1,6406 

660 

iLUO 

0.8281 

207 

8,3E-07 

ILUO  1,8281  165 

6.4E-07iLU0  1 

0.7031 

90 

2.7E-07 

0.7031  Q 

180 

BlockILU 

2.3281 

510 

9,4E-07 

BlockILU 

BlockILU 

1.4531 

220 

3.8E-07 

1.4531 

440 

BlockJacobi 

3,1875 

510 

9,4E-07 

HInck  J  acohi 

1,9375 

220 

3,8E-07 

1,9375 

440 

BlockJacobi2 

4.6719 

510 

9,4E-07 

BlockJacobl2 

^^^^HBIockJacobl2 

13.0938 

220 

3.8E-07 

4.6719 

440 

GS 

1,6719 

291 

1,0E-06 

14,7344 

130 

2,9E-07 

1,6719 

260 

Multigrid 

60,0625 

757 

9,9E-07 

^^^^iMultigrid 

955,3438 

360 

4,2E-07 

60,0625 

720 

dt=200 

dt=200 

dt=200 

dt=200 

None 
Upper 
Lower 
Jacobi 
ILUO 
BlockILU 
BlockJacobi 
BlockJacobi2  | 
GS 

Multigrid 


None 
Upper 
Lower 
Jacobi 
ILUO 
BlockILU 
BlockJacobi 
BlockJacobi2  | 
GS 

Muitigrid 


18.2656  3454 
12.4375  2316 


9,9E-07 

1,0E-06 


5.1406  896  1,0E-06 


10.7969  1743 


None 
Upper 
Lower 
Jacobi 
ILUO 
BlockILU 
BlockJacobi 
BlockJacobi2  | 
9,9E-07|GS 


■  None 
lUpper 

■  Lower 
|jacobi 

■ILUO  _ 

■BlockILU 

iBIockJacobi 

■  BlockJacobi2 

Igs 


5.0781  1140 
3.6250  590 
2.8906  480 
4.8438  1050 
1.6875  250 
4.2500  740 
6.8750  740 
42.5938  740 
43.8125  390 


5.4E-07 

8.1E-07 

1.4E-07 

9.2E-07 

2.5E-07 

9.6E-07 

9.6E-07 

9.9E-07 

5.7E-07 


5.0781  2280 
3.6250  1180 
2.8906  960 
4.8438  2100 
1  6875r^00 
4.2500  1480 
6.8750  1480 
42.5938  1480 
10.7969  780 


Muitigrid 

M  u  1  ti  q  r  i  d 

3377.0313 

1280 

4.5E-07 

######  2560 

dt=2000  1 

dt=2000 

dt=2000 

None 

None 

4.5469 

1500 

1.7E-07 

4.5469  3000 

Upper 

Uooer 

4.1875 

740 

7.5E-07 

4.1875  1480 

Lower 

3,4688 

610 

8,9E-07 

3,4688  1220 

Jacobi 

6.7813 

1420 

9.4E-07 

6.7813  2840 

ILUO 

2,2344 

340 

3,6E-09 

2,2344|  680 

BlockILU 

^^^^^^^^^■Rlnckll  U 

5.3906 

960 

2.3E-07 

5.3906  1920 

BlockJacobi 

Block  J  acobi 

9.0156 

960 

2.3E-07 

9.0156  1920 

BlockJacobi2 

Block  J  acobl2 

54.4375 

950 

4.5E-07 

54.4375  1900 

GS 

1,6E-07 

54,5469  980 

Multigrid 

u  1  tl  g  r  i  d 

4527.0156 

1740 

6.6E-07 

######  3480 

139 


Table  A. 5:  Primary  Preconditioner/Solver  benchmark  results  for  k  =  1,  KcaZe  =  0  us¬ 
ing  LDG  discretization.  Red  highlighting  indicates  the  solution  did  not  converge.  The 
fastest  simulation  for  a  given  CFL  number  is  highlighted  in  green,  and  the  iteration 
with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


Slash 

Time 

Precond. 

GmresR 

Time  her.  Resid. 

Precond. 

Qmr 

Time 

her.  Resid. 

Precond. 

BiCgstab 

Time  her.  Resid. 

Summary 
Min  Min 

Time  MV 

dt=0.2 

dt=0.2 

dt=0.2 

dt=0.2 

dt=0.2 

0.5781 

None 

0.5156 

62 

1,0E-06 

None 

0,5781 

43 

9.3E-07 

None 

0.4375 

30 

4.0E-08 

0.4375 

60 

0.5938 

Upper 

0.2813 

40 

7,9E-07 

Upper 

0,9688 

29 

7.2E-07 

Upper 

0.4844 

20 

5.2E-07 

0.2813 

40 

0.5938 

Lower 

0.2969 

36 

9.3E-07 

Lower 

0,7344 

24 

4.8E-07 

Lower 

0.3594 

20 

4.8E-09 

0.2969 

36 

0.6094 

Jacobi 

0.3594 

49 

9,6E-07 

Jacobi 

0,6406 

40 

4.  IE-07 

Jacobi 

0.4063 

30 

2.0E-07 

0.3594 

49 

0.5938 

ILUO 

0.1406 

27 

7,5E-07 

ILUO 

]  0,4219 

9 

5.2E-07 

ILUO 

0.3906 

10 

2.0E-11 

0.1406 

18 

0.5938 

BlockILU 

0.6406 

43 

7.8E-07 

BlockILU 

1,5000 

35 

4.0E-07 

BlockILU 

1.0781 

30 

4.0E-09 

0.6406 

43 

0.5938 

BlockJacobi 

0.6094 

43 

7,8E-07 

BlockJacobi 

1,6563 

35 

4.0E-07 

BlockJacobi 

1.1094 

30 

4.0E-09 

0.6094 

43 

0.5938 

BlockJacobi2 

1.1094 

43 

7,8E-07 

BlockJacobi2 

2,4844 

35 

4.0E-07 

BlockJacobi2 

9.4844 

30 

3.4E-09 

1.1094 

43 

0.5938 

GS 

0.8750 

34 

5.2E-07 

GS 

1.90625 

19 

7.  IE-07 

GS 

20.0625 

20 

4.0E-10 

0.8750 

34 

0.5938 

Multigrid 

1.1875 

29 

5,9E-07 

Multigrid 

Multigrid 

29.6719 

10 

1.3E-11 

1.1875 

20 

dt=2 

dt=2 

0.7031 

None 

2.1094 

248 

9,9E-07 

0.6406 

Upper 

2.0000 

173 

9,8E-07 

0.6563 

Lower  i 

1.2031 

107 

9,6E-07 

0.6406 

Jacobi 

2.1094 

243 

9,9E-07 

0.6563 

ILUO  1 

0.6563 

BlockILU 

3.7188 

168 

9,8E-07 

0.6406 

BlockJacobi 

3.4531 

168 

9,8E-07 

0.6563 

BlockJacobi2 

5.6563 

168 

9,8E-07 

0.6406 

GS 

3.6094 

93 

9,5E-07 

0.6719 

Multigrid 

1.4063 

31 

7,0E-07 

None 
Upper 
Lower 
Jacobi 
ILUO 
BlockILU 
BlockJacobi 
BlockJacobi2  | 
GS 


iNone 

lupper 

I  Lower 

Ijacobi 

llLUO 

iBIockILU 

iBIockJacobi 

lBlockJacobi2 

Igs 


I  Multigrid 


1.3281 

110 

8.4E-07 

1.3281 

2.2969 

140 

2.7E-07 

2.0000 

1.2656 

80 

4.2E-07 

1.2031 

1.9844 

160 

7.0E-07 

1.9844 

6.8125 

370 

4.5E-07 

6.8125 

4.7969 

120 

4.4E-07 

3.7188 

4.2188 

120 

4.4E-07 

3.4531 

37.2344 

120 

5.3E-07 

5.6563 

67.203125 

70 

7.2E-07 

3.6094 

30.0313 

10 

3.6E-11 

1.40631 

dt=20 
6.4531  1200 
9.7656  1200 
8.5625  1200 
7.3125  1200 
10.9688  1200 
3  23.7813  1200 
5  20.7344  1200 
######  1200 
######  1120 
■  1  54691  20 


dt=200 

dt=200 

dt=200  1 

dt=200 

dt=200 

0.7344 

None 

None 

6.8906 

600 

6,6E-h00 

6.8906 

1200 

0.6563 

Upper 

U  0  p  e  r 

Upoer 

9.9688 

600 

3.8E-05 

9.9688 

1200 

0.6563 

Lower 

8.5625 

600 

7.4E-05 

8.5625 

1200 

0.7188 

Jacobi 

^^^^^^^Eacobi 

6.8281 

600 

2.3E-05 

6.8281 

1200 

0.6250 

ILUO 

^^^^^■ll  UO 

UO 

10.8438 

600 

1.0E-01 

10.8438 

1200 

0.6719 

BlockILU 

BlockI  LU 

BlockI  LU 

23.9844 

600 

6.3E-06 

23.9844 

1200 

0.6406 

BlockJacobi 

^^^^^^^HHInckJacnhl 

HInck  J  acohl 

20.9063 

600 

6.3E-06 

20.9063 

1200 

0.6250 

0.6563 

BlockJacobi2 

GS 

BlockJ  acobl2 

Block  J  acobl2 

184.6875 

600 

6.0E-06 

2.0E-04 

###### 

###### 

1200 

1200 

0.6563 

1.5625 

32  9,2E-07lMultiarid 

30.3594 

10 

8.  IE-09 

1.5625 

20 

dt=2000 

0.7188 

0.6406 

0.6563 

0.6406 

0.6250 

0.6563 

0.6563 

0.6250 

0.6250 

0.6250 


None 

Upper 

Lower 

Jacobi 

ILUO 

BlockILU 

BlockJacobi 

BlockJacobi2 

G_S _ 

Multigrid 


None 

Upper 

Lower 

iJacobi 

ILUO 

BlockILU 

BlockJacobi 

BlockJacobi2 

GS 

Multigrid 


Lower 

iJacobi 

ILUO 

BlockILU 

BlockJacobi 

BlockJacobi2 

|GS 

Multigrid 


dt=2000  1 

1  dt=2000  1 

6.5000 

600 

2.8E-I-01 

6.5000 

1200 

9.71 88 

600 

3.4E-06 

9.7188 

1200 

8.3594 

600 

9.5E-05 

8.3594 

1200 

7.2969 

600 

2.5E-06 

7.2969 

1200 

10.9531 

600 

2.4E-04 

10.9531 

1200 

23.0781 

580 

9.6E-07 

23.0781 

1160 

19.8906 

580 

9.6E-07 

19.8906 

1160 

154.0625 

500 

9.8E-07 

###### 

1000 

2.5E-06 

###### 

1200 

30.4531 

10 

3.2E-11 

1.64061 

20 

1.6406 

20 

140 


Table  A. 6:  Primary  Preconditioner/Solver  benchmark  results  for  k  =  0,  KcaZe  =  1  us¬ 
ing  LDG  discretization.  Red  highlighting  indicates  the  solution  did  not  converge.  The 
fastest  simulation  for  a  given  CFL  number  is  highlighted  in  green,  and  the  iteration 
with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


Slash 

Time 

Pr«>ond. 

GmresR 

Time  Iter.  Resid. 

Precond. 

Qmr 

Time 

Iter. 

Resid. 

Precond. 

BiCgstab 

Time  Iter.  Resid. 

Summary 
Min  Min 

Time  MV 

clt=0.2 

dt=0.2 

dt=0,2 

dt=0,2 

dt=0.2 

0.2969 

None 

0.4063 

53 

7.9E-07 

None 

0.3750 

39 

9.9E-07 

None 

0.4531 

20 

2.  IE-07 

0.3750 

40 

0.2500 

Upper 

0.3594 

39 

9.2E-07 

Upper 

0.5156 

25 

7.8E-07 

Upper 

0.3594 

20 

7.0E-08 

0.3594 

39 

0.2813 

Lower 

0.2656 

34 

8,3E-07 

Lower 

0.3906 

16 

7.7E-07 

Lower 

0.3125 

20 

9,3E-11 

0.2655 

32 

0.2500 

Jacobi 

0.2813 

45 

7.9E-07 

Jacobi 

0.3281 

30 

6.7E-07 

Jacobi 

0.3906 

20 

3.8E-07 

0.2813 

40 

0.2656 

ILUO 

]  0.1250 

24 

6,6E-07 

ILUO 

]  0,2031 

5 

2.6E-08 

ILUO 

0.2656 

10 

4,4E-14 

0.1250 

10 

0.2813 

BlockILU 

0.2188 

26 

9,9E-07 

BlockILU 

0.2656 

7 

3.0E-07 

BlockILU 

0.4063 

10 

1,5E-13 

0.2188 

14 

0.2500 

BlockJacobi 

0.1875 

26 

9.9E-07 

BlockJacobi 

0.3594 

7 

3.0E-07 

BlockJacobi 

0.5156 

10 

1.5E-13 

0.1875 

14 

0.2344 

BlockJacobi2 

0.2813 

26 

9,9E-07 

BlockJacobi2 

0.3906 

7 

3.0E-07 

BlockJacobl2 

2.4531 

10 

1,8E-13 

0.2813 

14 

0.2344 

GS 

0.2031 

25 

1.6E-07 

GS 

0.29688 

6 

1.4E-08 

GS 

3.4375 

10 

2.3E-15 

0.2031 

12 

0.2031 

Multigrid 

0.4063 

23 

5.8E-07 

Multigrid 

Multigrid 

28.5000 

10 

3.5E-12 

0.4063 

20 

dt=2 

0.2500 

0.1875 

0.2031 

0.2500 


None 

Upper 

Lower 

Jacobi 


0.2188 

ILUO 

0.2188 

31 

7,9E-07 

ILUO 

0.3281 

14 

4.4E-07 

ILUO  1 

0.2031 

BlockILU 

0.6094 

41 

6.6E-07 

BlockILU 

1.1406 

25 

4.9E-07 

BlockILU 

0.2188 

BlockJacobi 

0.6094 

41 

6,6E-07 

BlockJacobi 

0.9844 

25 

4.9E-07 

BlockJacobi 

0.1875 

BlockJacobi2 

0.5938 

41 

6,6E-07 

BlockJacobl2 

0.9531 

25 

4.9E-07 

BlockJacobl2 

0.1875 

GS 

0.3906 

33 

5.9E-07 

GS 

0.71 875 

17 

1.4E-07 

GS 

0.1875 

Multigrid 

0.5469 

25 

9,1E-07 

Multigrid 

Multiqrid 

dt=2 

1.4063  140 

105.2500  600 

7.0469  600  5.1E+00 

3.2500  290 
0.2969  10 

20 
20 
20 
10 
10 


0.8906 

0.7500 

4.5938 

3.609375 

29.2656 


7.0E-07 

3,4E-( 

3.5E-12 

3,5E-12 

3,5E-12 

6.3E-09 

7,5E-14 


dt=: 

2 

1.4063 

280 

###### 

1200 

7.0469 

1200 

3.2500 

580 

”0.2188] 

1  ^ 

0.6094 

40 

0.6094 

40 

0.5938 

40 

0.3906 

1  20 

0.5469 

US 

dt=20 

0.2500 

0.1875 

0.2031 

0.2500 


None 

Upper 

Lower 

Jacobi 


0.2500 

ILUO 

0.8438 

101 

9,8E-07 

0.2344 

BlockILU 

2.6875 

131 

9.2E-07 

0.2188 

BlockJacobi 

2.3125 

131 

9,2E-07 

0.2500 

BlockJacobi2 

2.5156 

131 

9,2E-07 

0.2188 

GS 

1.4219 

83 

8.6E-07 

0.2188 

Multigrid 

0.7344 

27 

5,6E-07 

dt=20 
5.5625 
72.8438 
84.7344 
5.9375 
0.9375 
2.7031 
2.3906 
14.71 88 
12.6094 
28.7500 


600 

600 

600 

600 

60 

70 

70 

70 


8.7E-01 

3,6E-08 

2.6E-07 

2,6E-07 

2,6E-07 

5.6E-07 

4,7E-15 


dt=20 

5.5625  1200 
72.8438  1200 
84.7344  1200 
5.9375  1200 
0.8438  101 

2.6875  131 

2.3125  131 

2.5156  131 

1.4219  80 

0,7344r 


20 


dt=200 

0.2500 

0.2031 

0.1875 

0.2188 


None 

Upper 

Lower 

Jacobi 


0.2188 

ILUO 

5.3125 

579 

9.8E-07 

0.2031 

BlockILU 

16.4688 

741 

1.0E-06 

0.2500 

BlockJacobi 

14.5469 

741 

1,0E-05 

0.2031 

BlockJacobi2 

15.4688 

741 

1.0E-06 

0.2188 

GS 

7.3750 

365 

9,8E-07 

0.1875 

Multigrid 

0.8125 

28 

5,8E-07 

ctt=2000 

0.2500 

0.1875 

0.1875 

0.1719 

0.2031 

0.1875 

0.2344 

0.2031 

0.2188 

0.1875 


dt=200 
5.1406  1200 
######  1200 
######  1200 
5.9688  1200 
2.4531  360 

11.3281 
9.8594 
15.4688 
7.3750 


600 

600 

600 

320 


dt=2000 

dt=2000 

None 

5.7031 

600 

5.4E+01 

5.7031 

1200 

Upper 

121.5469 

600 

###### 

1200 

Lower 

122.5313 

600 

###### 

1200 

Jacobi 

5.6094 

600 

2,0E-01 

5.6094 

1200 

ILUO 

6.7344 

550 

9.  IE-07 

6.7344 

1100 

BlockILU 

22.5625 

600 

6,3E-06 

22.5625 

1200 

BlockJacobi 

20.0000 

600 

6.3E-06 

20.0000 

1200 

BlockJacobl2 

124.2344 

600 

7.0E-06 

###### 

1200 

GS  1 

7,0E-07 

###### 

1000 

Multiqrid  | 

29.1563 

10 

1.8E-11 

■  0.921 9( 

20 

0.9219 

20 
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Table  A. 7:  Primary  Preconditioner/Solver  benchmark  results  for  k  =  1,  Vscaie  =  1  us¬ 
ing  LDG  discretization.  Red  highlighting  indicates  the  solution  did  not  converge.  The 
fastest  simulation  for  a  given  CFL  number  is  highlighted  in  green,  and  the  iteration 
with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


Slash 

Time 

Precond. 

GmresR 

Time  Iter.  Resid. 

Precond. 

Qmr 

Time 

Her.  Resid. 

Precond. 

BiCgstab 

Time  Iter.  Resid. 

Summary 
Min  Min 

Time  MV 

dt=0.2 

dt=0.2 

dt=0.2 

dt=0.2 

dt=0.2 

0.5938 

None 

0.4688 

66 

9.6E-07 

None 

0.8438 

50 

7.3E-07 

None 

0.4063 

30 

1.2E-07 

0.4063 

60 

0.6094 

Upper 

0.3438 

42 

7.5E-07 

Upper 

1.0313 

32 

6.  IE-07 

Upper 

0.4219 

20 

8.2E-07 

0.3438 

40 

0.5781 

Lower 

0.2969 

37 

9.3E-07 

Lower 

0.7031 

26 

3.9E-07 

Lower 

0.4063 

20 

1.2E-08 

0.2969 

37 

0.5938 

Jacobi 

0.3750 

49 

7.3E-07 

Jacobi 

0.7344 

43 

9.5E-07 

Jacobi 

0.5313 

30 

2.5E-07 

0.3750 

49 

0.5938 

ILUO  1 

0.1563 

28 

4.9E-07 

ILUO 

]  0.6094 

10 

5.  IE-07 

ILUO 

]  0.2969 

10 

4.4E-11 

0.1563 

20 

0.6094 

BlockILU 

0.5781 

41 

8.9E-07 

BlockILU 

1.4688 

35 

3.6E-07 

BlockILU 

1.0938 

30 

3.6E-09 

0.5781 

41 

0.5938 

BlockJacobi 

0.5938 

41 

8.9E-07 

BlockJacobi 

1.6094 

35 

3.  IE-07 

BlockJacobi 

1.1094 

30 

3.6E-09 

0.5938 

41 

0.6094 

BlockJacobi2 

1.0625 

41 

8.9E-07 

BlockJacobi2 

2.5000 

35 

3.  IE-07 

BlockJacobi2 

9.4844 

30 

3.6E-09 

1.0625 

41 

0.5938 

GS 

0.7344 

33 

7.3E-07 

GS 

1,92188 

19 

7.7E-07 

GS 

19.84375 

20 

3.7E-10 

0.7344 

33 

0.6094 

Multigrid 

1.0781 

28 

8.7E-07 

Multigrid 

Multigrid 

]  29.9063 

10 

1.0E-12 

1.0761 

20 

0.7031 

None 

3.1094 

343 

9.7E-07 

0.6406 

Upper 

2.5938 

206 

9.9E-07 

0.6094 

Lower 

1.1563 

105 

9.5E-07 

0.6719 

Jacobi 

2.2188 

254 

1.0E-06 

0.6250 

ILUO  1 

0.6250 

BlockILU 

3.0156 

133 

9.9E-07 

0.6406 

BlockJacobi 

2.6563 

133 

9.9E-07 

0.6250 

BlockJacobi2 

4.2656 

133 

9.9E-07 

0.6094 

GS 

3.0000 

80 

8.9E-07 

0.6563 

Multiqrid 

1 .4375 

31 

9.7E-07 

None 
Upper 
Lower 
Jacobi 
ILUO 
BlockILU 
BlockJacobi 
BlockJacobi2  | 
GS 

Multigrid 


iNone 

lUpper 

I  Lower 

Ijacobi 

llLUO 

iBIockILU 

iBIockJacobi 

lBlockJacobi2 

Igs 


I  Multigrid 


“! 


dt=2 

1.9688  170 

2.2188  130 

1.2500  80 

1.9375  160 

10.9531  600 

3.9688  100 

3.6406  100 

30.9219  100 

58.375  60 

29.9219  10 


ctt=20 

1  ^dt=20^^^^^^| 

0.6563 

None 

0.6250 

Upper 

9.9688 

788 

1.0E-06 

0.6250 

Lower 

5.4688 

469 

1.0E-06 

0.6875 

Jacobi 

8.2656 

906 

1.0E-06 

0.6250 

ILUO 

0.6250 

BlockILU 

13.0469 

551 

1.0E-06 

0.6406 

BlockJacobi 

1 1 .5469 

551 

1.0E-06 

0.6250 

BlockJacobi2 

19.7188 

551 

1.0E-06 

0.6250 

GS 

13.9688 

317 

1.0E-06 

0.6250 

Multiqrid 

1.6719 

33 

6.8E-07 

ctt=20 

6.4531 

6.7656 

3.8594 

6.4688 

10.9531 

16.7813 

15.0469 

126.3750 

219.0156 

30.3281 


6.7E-04 
3.7E-07 
9.7E-07 
8.0E-07 
2.0E-02  • 
1.4E-07  1 
1.4E-07  1 
9.6E-07  1 
2.3E-07  1 
5.7E-09  ' 


Clt=200 

0.7031 

0.5938 

0.6250 

0.6719 

0.6406 

0.6406 

0.6250 

0.6250 

0.6250 

0.6563 


None 

Upper 

Lower 

Jacobi 

ILUO 

BlockILU 

BlockJacobi 

BlockJacobi2 

GS 

Multigrid 


None 
Upper 
Lower 
iJacobi 
ILUO 
BlockILU 
BlockJacobi 
BlockJacobl2 
GS _ 

Multigrid 


clt=200 
6.8594  1200 
9.8906  1200 
8.6250  1200 
7.2344  1200 
11.0156  1200 
)  23.3125  1200 
5  21.0156  1200 
######  1200 
_######  840 

1.7656]  M 


ctt=2000 

0.6875 

0.6406 

0.6250 

0.6563 

0.6250 

0.6250 

0.6250 

0.6719 

0.6250 

0.6250 


None 

Upper 

Lower 

Jacobi 

ILUO 

BlockILU 

BlockJacobi 

BlockJacobi2 

S  _  _ 

Multigrid 


None 

Upper 

Lower 

Jacobi 

ILUO 

BlockILU 

BlockJacobi 

BlockJacobi2 

GS 

Multigrid 


Multigrid 


dt=2000  1 

1  dt=2000  1 

6.5625 

600 

1.9E+01 

6.5625 

1200 

9.8594 

600 

1.2E-04 

9.8594 

1200 

8.6563 

600 

2.5E-05 

8.6563 

1200 

7.2031 

600 

9.4E-05 

7.2031 

1200 

10.6875 

600 

3.6E-03 

10.6875 

1200 

23.6875 

600 

1.3E-05 

23.6875 

1200 

20.9063 

600 

1.3E-05 

20.9063 

1200 

184.8125 

600 

1.  IE-05 

###### 

1200 

1.4E-06 

###### 

1200 

1  31.5313 

10 

2.  IE-09 

1  7656] 

20 
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Table  A. 8:  GMRES  versus  BiCGSTAB(/)  restart  benchmark  results  for  k  =  1,  Vgcaie  = 
0  using  LDG  discretization.  Red  highlighting  indicates  the  solution  did  not  converge. 
The  fastest  simulation  for  a  given  GFL  number  is  highlighted  in  green,  and  the  iter¬ 
ation  with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


Re- 

GmresR 

Re- 

BiCgstab 

Summary 

Min  Min 

starts 

Time 

iter.  Resid. 

Flag 

starts 

Time 

Iter.  Resid, 

Flag 

Time 

MV 

d 

t=0.2 

dt=0.2 

dt=0.2 

2 

0.1875 

10  4.5E-07 

0 

1 

0.2344 

6 

9.8E-07 

0 

0.1875 

10 

4 

0.1875 

11  8.4E-07 

0 

2 

0.2500 

6 

5.6E-07 

0 

0.1875 

11 

8 

0.1719 

15  7.5E-07 

0 

4 

0.2813 

8 

1.6E-08 

0 

0  1719 

15 

10 

0  1719 

17  7.5E-07 

0 

5 

0.3125 

10 

3.5E-1 1 

0 

0  1719 

17 

12 

0.2031 

19  7.5E-07 

0 

6 

0.2656 

6 

2,4E-07 

0 

0.2031 

12 

14* 

0.1719 

21  7.5E-07 

0 

7 

0.2813 

7 

1.1  E-07 

0 

0  1719 

14 

16 

0.1719 

23  7.5E-07 

0 

8 

0.3125 

8 

1.0E-08 

0 

0  1719 

16 

18 

0.2031 

25  7.5E-07 

0 

9 

0.3438 

9 

2,6E-10 

0 

0.2031 

18 

20 

0.1719 

27  7.5E-07 

0 

10 

0.3438 

10 

2.0E-1 1 

0 

0  1719 

20 

24 

0.1719 

31  7.5E-07 

0 

12 

0.4375 

12 

4.6E-13 

0 

0  1719 

24 

30 

0  1719 

37  7.5E-07 

0 

15 

0.5156 

15 

7.6E-13 

0 

0  1719 

30 

40 

0.1719 

47  7.5E-07 

0 

20 

0.7031 

20 

2.9E-14 

0 

0  1719 

40 

50 

0.1719 

57  7.5E-07 

0 

25 

0.9531 

25 

1.7E-14 

0 

0.1719 

50 
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Table  A. 9:  GMRES  versus  BiCGSTAB(/)  restart  benchmark  results  for  k  =  0,  Vgcaie  = 
1  using  LDG  discretization.  Red  highlighting  indicates  the  solution  did  not  converge. 
The  fastest  simulation  for  a  given  GFL  number  is  highlighted  in  green,  and  the  iter¬ 
ation  with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


GmresR 

BiCgstab 

1  Summary  I 

Re- 

Re¬ 

iMin 

Min  1 

starts 

Time  Iter. 

Resid. 

Flag 

starts 

Time  Iter.  Resid. 

Flag 

|77me 

MV  \ 

dr 

=0.2 

dr= 

=0.2 

1  dr=0^2 _ 1 

2| 

0.1719 

6 

8.3E-07 

0 

1l 

1  0,3281 

3 

7.3E-07 

0 

0,1719 

1  6 

4 

0.1094 

8 

6.6E-07 

0 

2 

0.2031 

4 

6.4E-10 

0 

0.1094 

8 

8 

0.1250 

12 

6.6E-07 

0 

4 

0.1719 

4 

5.2E-1 1 

0 

0.1250 

8 

10 

0.0938 

14 

6.6E-07 

0 

5 

0.1875 

5 

5.0E-13 

0 

0.0938 

10 

12 

0.1250 

16 

6.6E-07 

0 

6 

0.2344 

6 

9.0E-15 

0 

0.1250 

12 

14 

0.1250 

18 

6  6E-07 

0 

7 

0.2344 

7 

1.5E-16 

0 

0.1250 

14 

16 

0.1250 

20 

6.6E-07 

0 

8 

0,2969 

8 

1.0E-14 

0 

0,1250 

16 

18 

0.0938 

22 

6.6E-07 

0 

9 

0.2188 

9 

2.3E-13 

0 

0.0938 

18 

20 

0.1250 

24 

6.6E-07 

0 

10 

0.2344 

10 

4.4E-14 

0 

0.1250 

20 

24 

0.1250 

28 

6.6E-07 

0 

12 

0.3594 

12 

2.0E-14 

0 

0.1250 

24 

30 

0.1250 

34 

6.6E-07 

0 

15 

0.4219 

15 

4.5E-16 

0 

0.1250 

30 

40 

0.1250 

44 

6  6E-07 

0 

20 

0.5625 

20 

9.0E-17 

0 

0.1250 

40 

50 

0.1406 

54 

6.6E-07 

0 

25 

0,8750 

25 

1. IE-16 

0 

0,1406 

50 

■ 

0.0938 

6 

dt=2 

dt=2 

dt=: 

2 

2 

0.1094 

14 

9  8E-07 

0 

1 

0.2344 

8 

1.1E-07 

0 

0  1094' 

1  I4 

4 

0.1563 

15 

8.1E-07 

0 

2 

0,2188 

8 

1.4E-07 

0 

0,1563' 

15 

8 

0.1719 

19 

8.2E-07 

0 

4 

0.2500 

8 

9.0E-08 

0 

0.1719 

16 

10 

0.1406 

21 

8.1E-07 

0 

5 

0.2969 

10 

2.8E-09 

0 

0.1406 

20 

12 

0.2031 

23 

7.9E-07 

0 

6 

0.2500 

12 

4.1E-12 

0 

0.2031 

23 

14 

0.2188 

25 

7.9E-07 

0 

7| 

1  0.2656 

7 

5.7E-07 

0 

0.2188 

1  14 

16 

0.1563 

27 

7  9E-07 

0 

8 

0.2188 

8 

5.6E-08 

0 

0.1563 

16 

18 

0.2500 

29 

7.9E-07 

0 

9 

0,2188 

9 

2.0E-08 

0 

0,2188 

18 

20 

0.2188 

31 

7.9E-07 

0 

10 

0.3125 

10 

3.4E-08 

0 

0.2188 

20 

24 

0.1875 

35 

7.9E-07 

0 

12 

0.3438 

12 

3.0E-10 

0 

0.1875 

24 

30 

0.1719 

41 

7.9E-07 

0 

15 

0.3906 

15 

1.2E-10 

0 

0.1719 

30 

40 

0.1875 

51 

7.9E-07 

0 

20 

0.5313 

20 

6.6E-1 1 

0 

0.1875 

40 

50 

0.1719 

61 

7  9E-07 

0 

25 

0.6719 

25 

3.3E-1 1 

0 

0.1719 

50 

0.1094 

14 

dt 

=20 

dt 

=20 

I  dt=20 _ I 

2| 

0.9063 

92 

8.8E-07 

0 

1 

0.7656 

59 

4.8E-07 

0 

0.7656 

I  92 

4 

0.8750 

93 

9  6E-07 

0 

2 

0.5781 

50 

6.9E-07 

0 

0.5781 

93 

8 

0.8125 

96 

9.2E-07 

0 

4' 

0.5469 

48 

8.0E-07 

0 

0.5469 

96 

10 

0.8906 

96 

8.9E-07 

0 

5 

0.6406 

50 

8.4E-07 

0 

0.6406 

96 

12 

0.9063 

96 

9  6E-07 

0 

6 

0.6563 

48 

7.9E-07 

0 

0.6563 

96 

14 

0.8125 

97 

9.2E-07 

0 

7 

0.6719 

49 

8.0E-07 

0 

0.6719 

97 

16 

1.0000 

101 

9.1E-07 

0 

8 

0.6250 

48 

7.9E-07 

0 

0.6250 

96 

18 

0.9688 

102 

9  3E-07 

0 

9 

0.8125 

54 

6.6E-07 

0 

0.8125 

102 

20 

0.9531 

101 

9.8E-07 

0 

10 

0.9688 

60 

3.6E-08 

0 

0.9531 

101 

24 

0.9219 

105 

8.9E-07 

0 

12 

0.8906 

48 

9.0E-07 

0 

0.8906 

96 

30 

0.9219 

107 

9  4E-07 

0 

15 

1.0781 

60 

4.2E-08 

0 

0.9219 

107 

40 

1 .2344 

115 

9.0E-07 

0 

20 

1.6563 

60 

3.7E-07 

0 

1.2344 

115 

50 

1.2188 

126 

9.1E-07 

0 

25 

2.0625 

75 

6.8E-08 

0 

1.2188 

126 

dt= 

=200 

dt= 

=200 

dt=200 

2 

5.6406 

662 

9.8E-07 

0 

1 

2.5000 

267 

6.4E-07 

0 

2.5000 

534 

4 

4.8594 

633 

1.0E-06 

0 

2 

1.8594 

192 

8.4E-07 

0 

1.8594 

384 

8 

5.0313 

634 

9  9E-07 

0 

4 

1.8594 

184 

8.9E-07 

0 

1.8594 

368 

10 

5.1719 

636 

1.0E-06 

0 

5 

;  1.8125 

180 

8.4E-07 

0 

1  8125! 

1  360 

12 

5.1250 

604 

1.0E-06 

0 

6 

1  1.8438 

180 

5.8E-07 

0 

1.8438| 

1  360 

14 

5.0625 

582 

9  8E-07 

0 

7 

2.1875 

182 

5.6E-07 

0 

2.1875 

364 

16 

5.4844 

582 

1.0E-06 

0 

8 

2.1563 

184 

4.8E-07 

0 

2.1563 

368 

18 

5.2656 

555 

1.0E-06 

0 

9 

2.2969 

180 

7.0E-07 

0 

2.2969 

360 

20 

5.5156 

579 

9  8E-07 

0 

10 

2.3438 

180 

6.0E-07 

0 

2.3438 

360 

24 

5.1875 

499 

1.0E-06 

0 

12 

2.7656 

180 

5.6E-07 

0 

2.7656 

360 

30 

5.6719 

521 

9.9E-07 

0 

15 

2.9844 

180 

9.8E-07 

0 

2.9844 

360 

40 

5.3906 

422 

9  9E-07 

0 

20 

4.9688 

200 

3.1E-07 

0 

4.9688 

400 

50 

5.6094 

381 

9.9E-07 

0 

25 

6.4531 

250 

4.2E-07 

0 

5.6094 

381 

1.8125 

360 

dt= 

2000 

dt= 

2000 

1  dt=2000  1 
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Table  A.  10:  GMRES  versus  BiCGSTAB(Z)  restart  benchmark  results  for  n  = 
1,  Vscaie  =  1  using  LDG  discretization.  Red  highlighting  indicates  the  solution  did 
not  converge.  The  fastest  simulation  for  a  given  GFL  number  is  highlighted  in  green, 
and  the  iteration  with  the  fewest  matrix-vector  multiplications  is  outlined  in  orange. 


ffe- 

GmresR 

Re- 

BiCgstab 

Summary 

Min  Min 

starts 

Time 

Iter.  Resid. 

Flag 

starts 

Time 

Iter.  Resid. 

Flag 

Time 

MV 

dt=0.2 

d 

=0.2 

dt=0.2 

2 

0.2031 

10  1.0E-06 

0 

1 

0,2813 

7 

2.2E-07 

0 

0,2031 

10 

4 

0.2188 

12  5.5E-07 

0 

2 

0.3125 

8 

2.1E-08 

0 

0.2188 

12 

8 

0.2031 

16  4.9E-07 

0 

4 

0.2969 

8 

1.7E-08 

0 

0.2031 

16 

10 

0.1875 

18  4.9E-07 

0 

5 

0.3594 

10 

6.9E-1 1 

0 

0.1875 

18 

12 

0.2031 

20  4.9E-07 

0 

6 

0.3125 

6 

7.6E-07 

0 

0.2031 

12 

14 

0.1875 

22  4  9E-07 

0 

7 

0.2813 

7 

7.5E-08 

0 

0.1875 

14 

16 

0.1719 

24  4.9E-07 

0 

8 

0,3125 

8 

1.4E-08 

0 

0,1719 

16 

18 

0.1875 

26  4.9E-07 

0 

9 

0.3281 

9 

5.9E-10 

0 

0.1875 

18 

20 

0.1875 

28  4.9E-07 

0 

10 

0.3438 

10 

4.4E-1 1 

0 

0.1875 

20 

24 

0.1875 

32  4.9E-07 

0 

12 

0.3750 

12 

9.3E-12 

0 

0.1875 

24 

30 

0.1875 

38  4.9E-07 

0 

15 

0.5313 

15 

5.0E-12 

0 

0.1875 

30 

40 

0.2031 

48  4  9E-07 

0 

20 

0.7500 

20 

6.3E-14 

0 

0.2031 

40 

50 

0.2031 

58  4.9E-07 

0 

25 

0,8594 

25 

3.9E-1 1 

0 

0,2031 

50 
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Appendix  B 


Description  of  MATLAB 
functions  /  scripts 

B.l  Functions  and  Helper  Scripts  for  Implicit  In¬ 
tegration 

The  following  is  a  list  including  a  short  description  of  some  of  the  relevant  scripts/functions 
written  for  this  study. 

•  Bench. m;  This  is  the  main  program  function.  It  accepts  the  various  solver  or 
problem  parameters  and  outputs  the  computation  time,  error  flag,  residual,  it¬ 
erations  till  convergence,  and  the  problem  matrix.  It  is  set  up  to  do  multiple 
integrations  (advancing  the  solution  in  time)  for  multiple  constituents.  It  either 
computes  or  loads  from  a  hie  the  preconditioner  and  problem  matrix.  Using 
multiple  switch  statements  it  chooses  between  different  solvers  and  precondi¬ 
tioners.  The  computation  time  is  taken  from  the  onset  of  the  time  integration 
till  its  completion,  and  does  not  include  the  time  to  calculate  the  preconditioner 
or  problem  matrix.  The  rational  behind  this  is  that  the  formation  of  the  ma¬ 
trix/preconditioners  used  in  this  implementation  would  not  be  representative 
of  the  computational  time  of  an  optimized  implementation,  and  can  therefore 
not  be  compared  (in  terms  of  time)  to  a  MATLAB-implemented  preconditioner 
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such  as  ILU(O). 


•  BenchDriver  .m;  This  is  a  driver  script  containing  multiple  loops  to  make  calls 
to  Bench .  m  with  various  input  parameters.  There  are  multiple  versions  for  the 
driver  script  for  the  different  numerical  experiments,  but  each  have  essentially 
the  same  structure  with  slight  differences. 

•  BenchXcelOutput  .m;  Script  to  convert  MATLAB  output  to  Excel  for  ease  of 
analysis. 

•  Preconditioner  .m;  This  function  implements  some  of  the  preconditioners,  and 
computes  sparse  matrix  preconditioners  for  use  in  the  solvers. 

•  BlockPreconditioner  .m;  This  function  implements  some  of  the  precondition¬ 
ers,  and  is  passed  as  a  function  for  use  in  the  solver,  returning  the  product 

•  MG.m;  Script  used  to  implement  a  more  advanced  MG  preconditioner  with  an 
ILU(O)  preconditioner. 
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Appendix  C 


Figures 

C.l  Convergence  plots  for  Multigrid  Benchmark 
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Convergence  versus  Matrix-Vector  Multiplies:  Advection  Only 
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Figure  C-1:  Convergence  history  of  GMRES(m)  solver  using  different  preconditioners. 
Here  the  naive  MG  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother 
for  a  pure  advection  case  with  a  proper  p-MG  implementation.  Fourth  order  basis 
functions  are  used  on  the  hne  grid. 


Convergence  versus  Matrix-Vector  Multiplies:  Diffusion  Only 
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Figure  G-2:  Gonvergence  history  of  GMRES(m)  solver  using  different  preconditioners. 
Here  the  naive  MG  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother 
for  a  pure  diffusion  case  with  a  proper  p-MG  implementation.  Fourth  order  basis 
functions  are  used  on  the  hne  grid. 
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Convergence  versus  Matrix-Vector  Multiplies:  Advection-Diffusion 


t  GMRES(5)  +  MG 
GMRES(20)  +  ILU(O) 
--X— -  GMRES(20)  MG  naive 


100  150  200 

Matrix-Vector  Multiplies 


Figure  C-3:  Convergence  history  of  GMRES(m)  solver  using  different  preconditioners. 
Here  the  naive  MG  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother 
for  advection-diffusion  case  with  a  proper  p-MG  implementation.  Fourth  order  basis 
functions  are  used  on  the  hne  grid. 


Convergence  versus  Matrix-Vector  Multiplies:  Advection  Only 
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Figure  G-4:  Gonvergence  history  of  GMRES(m)  solver  using  different  preconditioners. 
Here  no  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother  for  a  pure 
advection  case  with  a  proper  p-MG  implementation.  Fourth  order  basis  functions  are 
used  on  the  hne  grid. 
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Convergence  versus  Matrix-Vector  Multiplies:  Diffusion  Only 
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Figure  C-5:  Convergence  history  of  GMRES(m)  solver  using  different  preconditioners. 
Here  no  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother  for  a  pure 
diffusion  case  with  a  proper  p-MG  implementation.  Fourth  order  basis  functions  are 
used  on  the  hne  grid. 


Figure  C-6:  Convergence  history  of  GMRES(m)  solver  using  different  precondition¬ 
ers.  Here  no  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother  for 
advection-diffusion  case  with  a  proper  p-MG  implementation.  Fourth  order  basis 
functions  are  used  on  the  hne  grid. 
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Figure  C-7:  Convergence  history  of  GMRES(m)  solver  using  different  preconditioners. 
Here  the  ILU(O)  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother 
for  a  pure  advection  case  with  a  proper  p-MG  implementation.  Fourth  order  basis 
functions  are  used  on  the  hne  grid. 


Figure  G-8:  Gonvergence  history  of  GMRES(m)  solver  using  different  preconditioners. 
Here  the  ILU(O)  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother  for  a 
pure  diffusion  case  with  a  proper  p-MG  implementation.  Fourth  order  basis  functions 
are  used  on  the  hne  grid. 
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Figure  C-9:  Convergence  history  of  GMRES(m)  solver  using  different  preconditioners. 
Here  ILU(O)  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother  for 
advection-diffusion  case  with  a  proper  p-MG  implementation.  Fourth  order  basis 
functions  are  used  on  the  hne  grid. 


Convergence  versus  Matrix-Vector  Multiplies:  Advection  Only 
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Figure  G-10:  Gonvergence  history  of  GMRES(m)  solver  using  different  precondi¬ 
tioners.  Here  the  naive  MG  preconditioner  is  used  to  precondition  the  GMRES(m) 
smoother  for  a  pure  advection  case  with  a  proper  p-MG  implementation.  Second 
order  basis  functions  are  used  on  the  hne  grid. 
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Convergence  versus  Matrix-Vector  Multiplies:  Diffusion  Only 


10 


10 


10' 


t  GMRES(5)  +  MG 
GMRES(20)  +  ILU(O) 
-X— -  GMRES(20)  MG  naive 


0  50  100  150  200  250  300 

Matrix-Vector  Multiplies 


Figure  C-11:  Convergence  history  of  GMRES(m)  solver  using  different  precondi¬ 
tioners.  Here  the  naive  MG  preconditioner  is  used  to  precondition  the  GMRES(m) 
smoother  for  a  pure  diffusion  case  with  a  proper  p-MG  implementation.  Second  order 
basis  functions  are  used  on  the  fine  grid. 


Convergence  versus  Matrix-Vector  Multiplies:  Advection-Diffusion 


t  GMRES(5)  +  MG 
GMRES(20)  +  ILU(O) 
--X— -  GMRES(20)  MG  naive 


100  150  200 

Matrix-Vector  Multiplies 


Figure  G-12:  Gonvergence  history  of  GMRES(m)  solver  using  different  precondi¬ 
tioners.  Here  the  naive  MG  preconditioner  is  used  to  precondition  the  GMRES(m) 
smoother  for  advection-diffusion  case  with  a  proper  p-MG  implementation.  Second 
order  basis  functions  are  used  on  the  hne  grid. 
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Figure  C-13:  Convergence  history  of  GMRES(m)  solver  using  different  precondition¬ 
ers.  Here  no  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother  for  a 
pure  advection  case  with  a  proper  p-MG  implementation.  Second  order  basis  func¬ 
tions  are  used  on  the  hne  grid. 


Figure  C-14:  Convergence  history  of  GMRES(m)  solver  using  different  precondition¬ 
ers.  Here  no  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother  for  a 
pure  diffusion  case  with  a  proper  p-MG  implementation.  Second  order  basis  functions 
are  used  on  the  hne  grid. 
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Figure  C-15:  Convergence  history  of  GMRES(m)  solver  using  different  precondition¬ 
ers.  Here  no  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother  for 
advection-diffusion  case  with  a  proper  p-MG  implementation.  Second  order  basis 
functions  are  used  on  the  hne  grid. 


Figure  G-16:  Gonvergence  history  of  GMRES(m)  solver  using  different  precondition¬ 
ers.  Here  the  ILU(O)  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother 
for  a  pure  advection  case  with  a  proper  p-MG  implementation.  Second  order  basis 
functions  are  used  on  the  hne  grid. 
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Figure  C-17:  Convergence  history  of  GMRES(m)  solver  using  different  precondition¬ 
ers.  Here  the  ILU(O)  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother 
for  a  pure  diffusion  case  with  a  proper  p-MG  implementation.  Second  order  basis 
functions  are  used  on  the  hne  grid. 


Figure  G-18:  Gonvergence  history  of  GMRES(m)  solver  using  different  precondition¬ 
ers.  Here  ILU(O)  preconditioner  is  used  to  precondition  the  GMRES(m)  smoother 
for  advection-diffusion  case  with  a  proper  p-MG  implementation.  Second  order  basis 
functions  are  used  on  the  hne  grid. 
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